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DESIGN PHILOSOPHY 

The Am29050'^^ Streamlined Instruction Processor is the result of a design philosophy 
that recognizes that processor performance must be considered in light of the 
processor's hardware and software environment. The key to maximizing performance 
lies in the realization that the processor is part of an integrated system, and is itself a 
collection of components that must be properly integrated. 

Processor features must be considered not only on their own merits, but also in 
relation to other components of the system. A particular feature that— considered 
alone — increases one aspect of processor performance may actually decrease the 
performance of the total system, because of the burden that it places elsewhere in the 
system. As an Illustration, consider the factors involved in the execution time of any 
processor task: 

TASK TIME = INSTRUCTIONS /TASK * CYCLES / INSTRUCTION * TIME /CYCLE 

To minimize the time taken, It Is necessary to minimize the above product. This is not 
equivalent to minimizing all of the terms that contribute to the product; this, in fact, is 
generally not possible due to the interaction of the terms. 

As an example of the Interaction of the above terms, consider the number of 
instructions required for a task. An attempt to minimize this number, a more or less 
traditional approach to processor architecture design. Increases the average number 
of cycles required for the execution of an instruction, because of the increased 
number of operations performed by each instruction. In addition, cycle time is 
increased because of Instruction-decode time. 

A second example of the interaction In the above equation appears in an attempt to 
reduce the cycle time through the pipelining of operations. In theory, the cycle time 
can be made arbitrarily small by the definition of an arbitrarily large number of pipeline 
stages. In practice — at least in the case of general-purpose processors — pipelining 
rarely yields much of Its potential benefit. This is due to situations where the pipeline 
cannot be kept fully occupied, such as when storage references and branches occur. 
In these situations, additional pipeline stages increase the number of cycles required 
for an operation, and thus affect the CYCLES/ INSTRUCTION term. 

OPTIMUM PERFORMANCE 

Each of the terms In the above equation has some minimum bound for a given 
Implementation technology and task. In general, this minimum bound cannot be 
approached without an offsetting Increase In the other terms, making the overall 
product less-than-optimum. The question then arises, what combination of terms 
does yield an optimum product? There are several things to note when answering 
this question. 

The first observation Is that the number of operations underlying a given task is 
more or less fixed. Any single processor ultimately limits the time required for a task 
because it has a single execution unit and a single Instruction stream. The operations 
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that must be performed are reflected in the INSTRUCTIONS /TASK and CYCLES/ 
INSTRUCTION terms. These operations may be performed by relatively few 
instructions, where each instruction takes multiple cycles to execute, or by a larger 
number of instructions, where each takes a single cycle to execute. In the first case, 
the instructions are complex; In the second, they are simple. 

The point is that the trade-off between simple and complex instructions is not one-to- 
one. For example, reducing the number of cycles per instruction by a factor of three 
does not Increase the number of instructions per task by the same factor. There are 
two reasons for this. The first is that, even when an instruction set supports complex 
operations, a large proportion of the instructions that are executed perform operations 
that could be performed as well by simple instructions. The second is that simple 
instructions expose more of the internal processor operation to an optimizing com- 
piler. This allows the compiler to tailor the organization and sequence of operations to 
the task at hand, thereby reducing the total number of instructions executed. 

PERFORMANCE LEVERAGE 

Another Important observation is that there is a tremendous amount of leverage in the 
TIME /CYCLE and CYCLES/ INSTRUCTION terms. As they are made smaller, they 
have a proportionately greater effect on performance. 

For example, a reduction of 10 ns in the cycle time of a processor operating with a 
200-ns cycle time yields an increase of 5% in the processor's performance. The same 
improvement in a processor operating with a 50-ns cycle time yields a 20% Increase 
in performance. 

Correspondingly, a reduction of 0.2 In the number of cycles per instruction in a proc- 
essor that averages 5 cycles per instruction yields a 4% increase in performance. 
However, the same reduction yields a 12.5% performance increase in a processor 
that averages 1 .6 cycles per instruction. 

CONCLUSION 

The conclusion is that it is possible— and desirable— to yield somewhat in the number 
of instructions executed for a given task, and more than make up for the performance 
impact of this increase by reductions In the cycle time and In the number of cycles per 
instruction. For example, if both the cycle time and the number of cycles per instruc- 
tion are reduced by a factor of three, while the number of Instructions for a given task 
is allowed to grow by 50%, the resulting task time is reduced by a factor of 6. 

The Am29050 microprocessor architecture was designed with the above effects in 
mind. Maximum performance is obtained by the optimization of the product of the 
number of Instructions per task, the number of cycles per instruction, and the cycle 
time, not by minimizing one factor at the expense of the others. This is accomplished 
by careful definition of all processor components. In particular: 

1 . The INSTRUCTION /TASK term Is optimized by the definition of simple instruc- 
tions. The processor provides an efficient Instruction set and a large number of 
general-purpose registers to an optimizing, high-level language compiler. Most 
reductions in this term are accomplished by the compiler. The number of instruc- 
tions for a given task may be greater than the number of instructions for 
processors with complex instruction sets. However, this increase is more than 
offset by other improvements in processor performance. 

2. The CYCLES /INSTRUCTION term is optimized by the data-flow structure and 
performance-enhancing features of the processor. A large amount of processor 
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hardware is dedicated to achieving an average instruction-execution rate that is 
close to single-cycle execution. 

3. The TIME /CYCLE term is optimized by the Implementation technology, the 
processor system interface, and judicious use of pipelining. The simplicity of the 
instruction set and processor features helps minimize the cycle time. 

Am29050 MICROPROCESSOR USER MANUAL OVERVIEW 

This manual contains information on the Am29050 processor that is essential for 
computer hardware and software architects and system design engineers. Additional 
information is available in the form of data sheets, application notes, and other 
documentation that is provided with software products and hardware-development 
tools. 

The information in this manual is organized into eight chapters, each viewing the 
processor from a different perspective, and each with a specific objective. 

Chapter 1 introduces the features and performance aspects of the Am29050 
microprocessor. 

Chapter 2 contains brief technical descriptions of the processor architecture and 
implementation. 

Chapter 3 describes the details of the Am29050 microprocessor architecture. 

Chapter 4 details the operation of the processor's internal functional units. 

Chapter 5 describes the operation of the external interfaces of the Am29050 
microprocessor. 

Chapter 6 describes the attachment and use of coprocessors for the Am29050 
microprocessor. 

Chapter 7 discusses the implementation of software systems for the processor, 
focusing on programming features that deserve more coverage than is provided by 
other chapters. 

Chapter 8 specifies the instruction set of the Am29050 microprocessor. It describes 
the instruction formats in detail, and provides a detailed description of every 
instruction. 

This manual is organized around readers' concerns and objectives. Each chapter 
focuses on a particular aspect of the processor, and is organized so that it may be 
read independently, insofar as possible. 

For those readers desiring only a brief overview of the Am29050 microprocessor, 
Chapters 1 and 2 Identify the outstanding features of the processor, and give a brief 
overview of the processor. These chapters address both software and hardware 
concerns. 

For software architects and system programmers interested mainly in software-related 
Issues, Chapters 3, 7, and 8 provide the necessary information. 

For hardware architects and systems hardware designers interested mainly in 
hardware-related issues. Chapters 4 and 5 provide most of the required information; 
Chapter 8 also provides some related Information. 

For those readers interested in the coprocessor Interface, Chapter 6 describes the 
interface both from a software and hardware point-of-view. 
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CHAPTER 1 



FEATURES AND PERFORMANCE 



U 



This chapter provides an evaluation of the Am29050 microprocessor as an aid in 
considering a particular application. A detailed technical description of the Am29050 
microprocessor is contained in subsequent chapters. This chapter informally de- 
scribes the features of the processor, concentrating on features which distinguish the 
Am29050 microprocessor from other available processors. 



1.1 



DISTINCTIVE CHARACTERISTICS 

Full 32-bit architecture 

Double-precision, Floating-Point Arithmetic Unit on-chip 

CMOS technology/TTL-compatible 

32 million instructions per second sustained at a 40-MHz operating frequency 

1 .25 clock cycles per Instruction average 

4-Gb virtual address space 

192 general-purpose registers 

Three-address instruction architecture 

Non-multiplexed, pipelined address, instruction and data buses 

Concurrent instruction and data accesses 

Burst-mode access support 

1024-byte Branch Target Cache^^ memory 

4-entry Physical Address Cache memory 

64-entry Memory Management Unit on-chip 

Demand paging 

Fully pipelined 

On-chip Timer Facility 

On-chip clock generation 

Enhanced debugging support 

Master/slave chip output checking 



1.2 



INTROOUCTIOH 

The Am29050 Streamlined Instruction Processor is a high-performance, 
general-purpose, 32-bit microprocessor implemented in complementary metal-oxide 
semiconductor (CMOS) technology. It supports a variety of applications, using a 
flexible architecture and rapid execution of simple instructions which are common to a 
wide range of tasks. 
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The Am29050 microprocessor extends the 29K™ Family of processors with a 
high-performance, pipelined, on-chip floating-point unit. The floating-point unit 
performs IEEE-compatible, single-precision and double-precision arithmetic at a peak 
rate of 80 million floating-point operations per second (MFLOPS) at 40 MHz. The 
Am29050 microprocessor also has features to improve the performance of loads and 
branches, allowing sustained integer performance of 32 million instructions per 
second (MIPS) at 40 MHz. 

The Am29050 microprocessor is fully hardware- and software-compatible with the 
Am29000T^ microprocessor. It can be used in existing Am29000 microprocessor 
applications without hardware or software modifications. It can bring a dramatic 
increase in performance to floating-point-intensive applications, particularly graphics 
and laser-printer applications. 

The Am29050 microprocessor is packaged in a 169-pin, pin-grid-array (PGA) pack- 
age, with 141 signal pins, 27 power and ground pins, and one alignment pin. A repre- 
sentative system diagram is shown in Figure 1-1. 

1 .3 PERFORMANCE OVERVIEW 

The Am29050 microprocessor provides a significant margin of performance over 
other processors in its class, since the majority of processor features were defined 
with the maximum achievable performance in mind. This section describes the fea- 
tures of the Am29050 microprocessor from the point-of-view of system performance. 

1.3.1 Cycle Time 

The Am29050 microprocessor is implemented in CMOS technology, with a 0.8 micron 
effective transistor-channel length. This technology allows the processor to operate at 
a frequency of 40 MHz. The processor cycle time is a single, 25-ns clock period. The 
processor interface drivers can drive 80-pF loads at this frequency. 

1 .3.2 Four-Stage Pipeline 

The Am29050 microprocessor utilizes a four-stage pipeline for integer operations, 
allowing it to execute one Integer instruction every clock cycle. The processor can 
complete an Instruction on every cycle, even though four cycles are required from the 
beginning of an instruction to its completion. 

Floating-point operations are pipelined to a depth determined by the operation 
latency, and are overiapped with integer operations. A floating-point operation and an 
Integer operation can complete at the same time without stalling the pipeline. 

At a 40-MHz operating frequency, the maximum instruction execution rate is 40 
million instructions per second (MIPS). For most other processors, the maximum 
MIPS rate has little meaning, because it can be achieved only under special 
circumstances. However, the Am29050 microprocessor pipeline is designed so that 
the Am29050 microprocessor can operate at the maximum Instruction-execution rate 
a significant portion of the time. 

Pipeline interiocks are Implemented by processor hardware, including those required 
for floating-point operations. Except for a few special cases, it Is not necessary to 
re-arrange programs to avoid pipeline dependencies, although this is sometimes 
desirable for performance. 
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Figure 1-1 Simplified System Diagram 
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1 .3.3 System Interface 

One of the most difficult tasks in the definition of a high-speed microprocessor is the 
definition of an off-chip interface which supports the operating frequency of the 
processor, and does not restrict the ability of the processor to fetch instructions and 
data. If the external interface of a microprocessor cannot support an instruction fetch 
rate of one instruction every cycle, there is little prospect that the processor will 
execute at this rate, even though it supports such a rate internally. 
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The Am29050 microprocessor accesses external instructions and data using three 
non-multiplexed buses. These buses are referred to collectively as the channel. The 
channel protocol minimizes the logic chains involved in a transfer, and provides a 
maximum transfer rate of 320 Mb/s at 40 MHz. 

1 .3.3.1 SEPARATE ADDRESS, INSTRUCTION, AND DATA BUSES 

The Am29050 microprocessor incorporates two 32-bit buses for instruction and data 
transfers, and a third address bus which is shared between instruction and data ac- 
cesses. This bus structure allows simultaneous instruction and data transfers, even 
though the address bus is shared. The channel achieves the performdnce of four 
separate 32-bit buses at a much reduced pin count. 

1 .3.3.2 PIPELINED ADDRESSES 

The Am29050 microprocessor address bus is pipelined, so that it can be released 
before an instruction or data transfer is completed. This allows a subsequent access 
to begin before the first has completed, and allows the processor to have two ac- 
cesses in progress simultaneously. 

1 .3.3.3 SUPPORT OF BURST DEVICES AND MEMORIES 

Burst-mode accesses provide high transfer rates for instructions and data at 
sequential addresses. For such accesses, the address of the first instruction or datum 
is sent, and subsequent requests for instructions or data at sequential addresses do 
not require additional address transfers. These instructions or data are transferred 
until either party Involved in the transfer terminates the access. 

Burst-mode accesses can occur at the rate of one access per cycle after the first 
address has been processed. At 40 MHz, the maximum achievable transfer 
bandwidth for either instructions or data is 160 Mb/s. 

Burst-mode accesses may occur to input/output devices, If the system design permits. 

1 .3.3.4 INTERFACE TO FAST DEVICES AND MEMORIES 

The processor can be interfaced to devices and memories which complete accesses 
within one cycle. The channel protocol takes maximum advantage of such devices 
and memories by allowing data to be returned to the processor during the cycle in 
which the address is transmitted. This allows a full range of memory-speed trade-offs 
to be made within a particular system. 

1 .3.4 Register File 

An on-chip Register File containing 192 general-purpose registers allows most 
instruction operands to be fetched without the delay of an external access. The 
Register File incorporates several features which aid the retention of data required by 
an executing program. Because of the number of general-purpose registers, the 
frequency of external references for the Am29050 microprocessor is significantly 
lower than the frequency of references in processors having only 16 or 32 registers. 

Four-port access to the Register File allows two 64-bit source-operands to be fetched, 
in one cycle, while two previously computed results are written; one write port is for 
integer operations, and the other port is for floating-point operations. Four 64-blt 
internal buses prevent contention in the routing of operands. All operand fetches and 
result write-backs for instruction execution can be performed in a single cycle. 

The registers allow efficient procedure linkage, by caching a portion of a compiler's 
run-time stack. On the average, procedure calls and returns can be executed 5 to 10 
times faster (on a cycle-by-cycle basis) than in processors which require the 
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implementation of a run-time stack in external memory (with the attendant loading and 
storing of registers on procedure call and return). 

1.3.5 Instruction Execution 

The Am29050 microprocessor uses an Arithmetic/Logic Unit, a Field Shift Unit, and a 
Prioritlzer to execute most instructions. Each of these is organized to operate on 
32-blt operands, and provide a 32-blt result. All operations are performed In a single 
cycle. 

Floating-point operations are performed in an on-chip Floating-Point Unit. The 
floating-point unit performs 32-blt, single-precision and 64-bit, double-precision 
computations. Most of the time, floating-point operations are performed in parallel with 
integer operations and other floating-point operations. 

Instruction operations are overlapped with operand fetch and result write-back to the 
Register File. Pipeline fon^^arding logic detects pipeline dependencies and routes data 
as required, avoiding delays which might arise from these dependencies. 

1 .3.6 Branch Target Cache Memory 

In general, the Am29050 microprocessor meets its instruction bandwidth 
requirements via instruction prefetching. However, instruction prefetching is 
ineffective when a branch occurs. The Am29050 microprocessor therefore 
incorporates a 64- or 128-entry (configurable at run time) Branch Target Cache 
memory to supply instructions for a branch — if this branch has been taken 
previously — ^while a new prefetch stream is established. 

If branch-target instructions are In the Branch Target Cache memory, branches 
execute in a single cycle. This has a very positive effect on processor performance, 
due to the amount of time the processor could othenwlse be idle waiting for the new 
instruction stream. 

As an example, consider that successful branches are 20% of a dynamic instruction 
mix, and that five cycles are required to restart the processor pipeline after a branch. 
For 20% of the Instructions, the processor would take one cycle to execute the branch 
instruction and wait five cycles to refill the instruction pipeline. The overhead of 
branch instructions would be six cycles. If the remaining 80% of the instructions 
require a single cycle to execute, the latency involved In branching would reduce the 
average execution rate from one cycle per instruction to two, thus halving the 
performance. 

The Branch Target Cache memory in the Am29050 microprocessor has an average 
hit rate of 80%. In other words. It eliminates the branch latency for 80% of all success- 
ful branches on the average. 



1.3.7 Branching 



Branch conditions in the Am29050 microprocessor can be based on Boolean data 
contained in general-purpose registers, as well as on arithmetic condition codes. 
Using a condition-code register for the purpose of branching can inhibit certain 
compiler optimizations, because the condition-code register can typically be modified 
by many different Instructions. It can be difficult for an optimizing compiler to schedule 
this shared use. Since it can treat branch conditions like any other instruction 
operands, the Am29050 microprocessor avoids this problem. 
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The Am29050 microprocessor executes branches in a single cycle, for those cases 
where the target of the branch is in the Branch Target Cache memory. The 
single-cycle branch is unusual for a pipelined processor, and is due to processor 
hardware which allows much of the branch instruction operation to be performed early 
in the execution of the branch. Single-cycle branching has a dramatic effect on 
performance, since successful branches typically represent 15% to 25% of a 
processor's instruction mix. 

The techniques used to achieve single-cycle branching also minimize the execution 
time of branches in those cases where the target is not in the Branch Target Cache 
memory. To keep the pipeline operating at the maximum rate, the instruction following 
the branch, referred to as the delay instruction, is executed regardless of the outcome 
of the branch. An optimizing compiler can define a useful instruction for the delay 
instruction in approximately 90% of branch instructions, thereby increasing the per- 
formance of branches. 

1 .3.8 Loads and Stores 

The performance degradation of load and store operations is minimized In the 
Am29050 microprocessor by overlapping them with instruction execution, by taking 
advantage of pipelining, and by organizing the flow of external data onto the proces- 
sor so that the impact of external accesses Is minimized. 

1 .3.8.1 OVERLAPPED LOADS AND STORES 

In the Am29050 microprocessor, a load or store is performed concurrently with 
execution of instructions which do not have dependencies on the load or store 
operation. An optimizing compiler can schedule loads and stores in the Instruction 
sequence so that, in most cases, data accesses are overlapped with instruction 
execution. 

Overlapped load and store operations can achieve up to a 30% improvement in 
performance when data memory has a two-cycle access time. Processor hardware 
detects dependencies while overlapped loads and stores are being performed, so 
dependencies have no software implications. 

A classical problem in the implementation of overlapped loads and stores Is that of 
dealing with address-translation exceptions In a demand-paged environment. Overlap 
is not possible if any load or store which encounters an address-translation exception 
must be restarted by the re-execution of the initiating instruction. In this case, the 
processor would have to hold instruction execution until the success of every load or 
store were insured. The Am29050 microprocessor exception restart mechanism 
automatically saves information required to restart any load or store, until the 
operation successfully completes. Thus, it allows the overlapped execution of loads 
and stores while properly handling address-translation exceptions. 

A second problem in the implementation of overlapped loads concerns the handling of 
data which is returned to the processor upon completion of the load. This data must 
be written to the register file, but It contends for register-file write-cycles with other 
instructions which are being overlapped with the load. This contention may be 
eliminated by adding a special write port to the register file. However, due to the size 
of the register file In the Am29050 microprocessor, a fifth port for writing incoming 
load data is not economical. 

The Am29050 microprocessor data-flow organization avoids the one-cycle penalty 
which would result from the contention between load data and the results of over- 
lapped instruction execution. Load data is buffered in a latch while awaiting an oppor- 
tunity to be written into the register file. This opportunity is guaranteed to arise before 
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the next load is executed. While the data is buffered in this latch, it may be used as an 
instruction operand in place of the destination register for the load. 

1 .3.8.2 EARLY LOADS 

The early load feature, Incorporating a 4-entry Physical Address Cache memory, 
speeds up the execution of load operations by making the physical address of the 
load available at the end of the decode cycle of the load instruction. At the beginning 
of the next cycle, when the load enters the execute stage, the physical address ap- 
pears on the channel. In effect, early loads reduce the effective access time of the 
external memory by one cycle. 

1 .3.8.3 LOAD MULTIPLE AND STORE MULTIPLE 

These Instructions allow the transfer of the contents of multiple registers to or from 
external memories or devices. This transfer can occur at a rate of one register-content 
per cycle. 

The advantage of Load Multiple and Store Multiple is best seen in task switching, 
register-file saving and restoring, and in block data moves. In many systems, such 
operations require a significant percentage of execution time. 

The load-multiple and store-multiple sequences are interruptible, so that they do not 
affect interrupt latency. 

1 .3.8.4 FORWARDING OF LOAD DATA 

Data which is sent to the processor at the completion of a load is fon^^arded directly to 
the appropriate execution unit if the data is required Immediately by an Instruction. 
This avoids the common one-cycle delay from bus transfer to use of data, and re- 
duces the access latency of external data by one cycle. 

1 .3.9 Memory Management 

A 64-entry Translation Look-Aside Buffer (TLB) and two Region Mapping registers on 
the Am29050 microprocessor perform virtual-to-physical address translation, avoiding 
the cycle which would be required to transfer the virtual address to an external TLB. A 
number of enhancements improve the performance of address translation: 

1 . Pipelining— The operation of the TLB is pipelined with other processor operations. 

2. Early Address Translation — Address translations for load, store, and branch 
instructions occur during the cycle in which these instructions are executed. This 
allows the physical address to be transferred externally in the next cycle. 

3. Region Mapping— The Region Mapping registers permit efficient mapping of 
large, contiguous regions of memory. This is useful for code libraries and large 
data structures; these can appear in a virtual address space without paging 
overhead. 

4. Task Identifiers— Task Identifiers allow TLB entries to be matched to different 
processes, so that TLB Invalidation is not required during task switches. 

5. Least-Recently Used Hardware— This hardware allows immediate selection of a 
TLB set to be replaced. 

6. Software Reload — Software reload allows the operating system to use a 
page-mapping scheme which is best matched to its environment. 
Paged-segmented, one-level-page mapping, two-level-page mapping, or any 
other user-defined page-mapping scheme can be supported. Because Am29050 
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microprocessor instructions execute at an average rate of nearly one instruction 
per cycle, software reload has a performance approaching that of hardware TLB 
reload. 

1.3.10 Interrupts and Traps 

When the Am29050 microprocessor takes an interrupt or trap, it does not 
automatically save its current state information in memory. This greatly improves 
the performance of temporary interruptions such as TLB reload or other simple 
operating-system calls which require no saving of state information. 

In cases where the processor state must be saved, the saving and restoring of state 
information is under the control of software. The methods and data structures used to 
handle interrupts— and the amount of state saved— may be tailored to the needs of a 
particular system. 

Interrupts and traps are dispatched through a 256-entry Vector Area, which directs 
the processor to a routine to handle a given interrupt or trap. The Vector Area may be 
relocated in memory by the modification of a processor register. There may be 
multiple Vector Areas in the system, though only one is active at any given time. 

The Vector Area is either a table of pointers to the interrupt and trap handlers, or a 
segment of instruction memory (possibly read-only memory) containing the handlers 
themselves. The choice between the two possible Vector Area definitions is deter- 
mined by the cost/performance trade-offs made for a particular system. 

If the Vector Area is a table of vectors in data memory, it requires only 1 kb of 
memory. However, this structure requires that the processor perform a vector fetch 
every time an interrupt or trap is taken. The vector fetch requires at least 3 cycles, in 
addition to the number of cycles required for the basic memory access. 

If the Vector Area is a segment of instruction memory, it requires a maximum of 64 kb 
of memory. The advantage of this structure is that the processor begins the execution 
of the interrupt or trap handler in the minimum amount of time. 

1 .4 OPTIMIZING COMPILERS 

The number of instructions used to perform a given task is minimized by optimizing 
compilers which are supplied for the Am29050 microprocessor. A full discussion of 
optimizing-compiler technology is beyond the scope of this manual, but there are a 
few concepts which should be mentioned here, because the Am29050 microproces- 
sor was designed to be an excellent target for optimizing compilers. 

1.4.1 Optimizing-Compiler Overview 

In addition to performing the same tasks as any other compiler, an optimizing 
compiler rearranges the generated code to minimize its size and execution time. This 
optimization occurs after the initial phases of code generation have been completed. 
The optimizer inspects large portions of the compiled program for frequently occurring 
cases where the compiled results can be improved. 

Many optimization opportunities arise precisely because the code is compiler gener- 
ated. Code translation is an automated process, so the initial phases of the compiler 
often generate code that is much less than optimum. However, the optimizer can 
produce results which are often better than those produced by human assembly- 
language programmers, because it can deal with large portions of the program and an 
immense amount of data concerning program behavior. 
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1.4.2 Optimizing-Compiler Operation 

Conceptually, the optimizer arranges program flow and the creation, modification, and 
use of program data to minimize the amount of time required to perform a given task. 
The reduction in program space Is a normal side-benefit of the reduction in execution 
time. The optimizer is concerned not only with data explicit in the high-level program, 
but also with data created by other phases of the compiler in order to properly trans- 
late the program (for example, temporary values created during the evaluation of 
expressions). Optimization involves the following sorts of operations: 

1. Reusing results rather than repeating computations. The optimizer attempts to 
eliminate redundant computations by performing a computation once, and saving 
the result for later use. Often these redundant computations are not apparent in 
the original program, but are created by the underlying definitions of high-level 
operations. 

2. Reducing the amount of code executed within loops. In many cases, only a few 
computations change on different loop iterations. The optimizer attempts to 
reduce the amount of work performed within loops to a minimum, by moving 
loop-Invariant computations outside of loops. 

3. Replacing slow operations by faster ones. The optimizer can recognize special 
cases of multiply and divide, for example, and replace them with faster shift and 
add Instructions. The slow operations, again, often are generated by earlier 
phases of the compiler because these operations are most general, and the early 
code-generation phases cannot recognize the special cases which allow the 
operations to be replaced with faster ones. 

4. Allocating processor registers so that they contain frequently used data. This 
reduces the number of relatively slow memory references, and replaces them by 
faster register references. 

5. Scheduling the execution of instructions. The optimizer attempts to move 
instructions to a point in the program flow where they create fewer problems for 
the processor pipeline. For example, a register load or a floating-point operation 
may be moved to a point In the instruction sequence where its execution can be 
overlapped with other instructions. 

Most optimizations performed rely heavily on two types of information collected by 
the optimizer: the first type deals with program flow, and the second with data 
dependencies which arise because of the program flow. The optimizer can tailor the 
code to the high-level task being compiled, not because It understands the task being 
performed by the high-level program, but because It understands the dependencies 
which arise in the generated code. As a result, it can adjust the Instruction sequence 
to minimize the performance impact of these dependencies. 

It is important to note that the optimizer does not directly optimize a given program, 
but rather optimizes a special representation of the program which is suitable for 
analysis and modification by the optimizer, which is, after all, just another program. 
The key to optimization is that this representation be easy to analyze for program 
and data-flow Information, and that it be easy to rearrange when optimizations are 
performed. 
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1.4.3 The Am29050 Microprocessor and Optimizing Compilers 

1.4.3.1 GENERAL PRINCIPLES 

The primary principle behind the Am29050 microprocessor instruction set is that it 
matches the internal representation used by optimizing compilers to perform 
optimization. As discussed above, this representation is not arbitrary, but is rather 
strictly defined by the optimization algorithms. 

It is important to realize that optimizations performed for the Am29050 microprocessor 
would have limited effectiveness if applied to so-called complex-instruction pro- 
cessors. There are several fundamental problems that limit the effectiveness of 
optimizations for these other processors. 

The first problem with complex-instruction sets is that they normally provide a variety 
of instruction sequences which perform the same function as a sequence of instruc- 
tions in the compiler's internal representation, but do not match it exactly. The 
trade-offs made by a compiler to decide among the available choices can be very 
complex. 

In the first place, it is difficult for the compiler to determine the difference in execution 
time between multiple instruction sequences, because of the amount of information 
involved. For example, just changing the addressing mode of an instruction can 
change the execution time. This is further complicated in the cases where the 
compiled program is to be run on different implementations of the same processor, 
where execution times can depend on the implementation. If there is only one 
instruction sequence to choose from, and if all instructions execute in a single cycle, 
this problem is reduced greatly. 

During the generation of code for a complex-instruction processor, it is nearly 
impossible to guarantee that the choice of a given code sequence will not force a 
less-than-optimum choice of code at some later point in the translation. Restrictions 
arise late in translation because of decisions made earlier. Often, these restrictions 
arise because of interactions between instructions; they are especially severe when 
instructions operate only on a specific register or group of registers. 

An additional problem with complex instruction sets is that optimizations applied to 
them do not necessarily save execution time. An optimization may not be reflected in 
the final compiled code, because the instruction set may inhibit the realization of the 
optimization. However, in the case of the Am29050 microprocessor, an optimization is 
guaranteed to eliminate one or more execution cycles, because all processor 
operations are exposed to the compiler. 

The greatest benefit of exposing all processor operations to the compiler appears 
within loops, which is where processors spend a great deal of their execution time. 
The problem with complex instruction sets here is that, when an instruction set forces 
multiple operations with one Instruction, the processor spends much time performing 
redundant computations within loops. Many times, the redundant computations are 
performed by microcode, which cannot detect that a computation Is loop-invariant, 
because it knows nothing of loops. The compiler is in no position to do much about 
this, because it cannot remove the loop-invariant computations from the 
micro-sequence; it is forced to accept the definitions of the instructions as they are. 

If an instruction set is defined so that all hardware-level operations are available to the 
compiler, the compiler is free to construct any sequence of these operations. This 
allows the movement of loop-invariant computations out of loops, which can result in 
tremendous performance Improvements. 
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1 .4.3.2 SPECIAL Am29050 MICROPROCESSOR FEATURES 

In addition to the above considerations, there are several other central principles 
behind the definition of the Am29050 microprocessor. 

The Am29050 microprocessor instruction set reduces the number of instructions 
required for most general-purpose tasks by providing a complete set of operations. 
The Instruction set is streamlined, but there Is no attempt to minimize the number of 
instructions. Rather, the goal is to minimize the number of instructions required to 
execute most high-level language programs. 

With a few minor exceptions, Am29050 microprocessor Integer instructions execute in 
a single cycle. As a result, the performance of an Am29050 microprocessor instruc- 
tion sequence is very easy to predict, simplifying the task of compiler Instruction- 
selection. In addition, single-cycle instruction execution allows the Am29050 
microprocessor to take the maximum advantage of a high-performance system 
design. Instructions are executed at approximately the rate at which they are supplied 
to the processor. The Am29050 microprocessor does not artificially constrain the 
instruction-execution rate by forcing instructions to require multiple cycles for 
execution, except in the unavoidable case of floating-point operations. 

The Am29050 microprocessor contains a large number of registers which facilitate 
compiler optimizations. These registers allow frequently used variables to be 
accessed quickly, provide a large number of temporary locations for the reuse of 
computational results, and simplify Inter-procedural communication. The compiler is 
free to allocate these registers as required to improve performance. Register 
allocation is relatively simple, because there Is such a large number of registers. 

For other processors which have fixed register-addressing, a compiler has difficulties 
allocating the usage of registers, because registers must be allocated statically at 
compile time. Procedure calls present the greatest difficulty. It Is impossible for the 
compiler to determine exactly which procedures will be called during execution, and in 
what order they will be called. Thus, It Is impossible to precisely allocate the usage of 
registers across procedure-call boundaries. 

Since the Am29050 microprocessor local registers are addressed relative to a Stack 
Pointer, compiler register-allocation Is simplified. The local registers are allocated 
dynamically during execution. Thus, the compiler need not be concerned about the 
allocation of registers across procedure boundaries; this is handled automatically by 
the local-register addressing. 

Am29050 microprocessor pipelining is exposed to the compiler In the form of delayed 
branches, overlapped loads and stores, and overlapped floating-point operations. The 
compiler is free to arrange instructions to reduce the performance Impact of the 
processor pipeline. However, the compiler arranges instructions only because of the 
performance benefits. Pipeline Interlocks in the Am29050 microprocessor guarantee 
correct operation in any case. 
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ARCHITECTURE HIGHLIGHTS ^ 



This chapter gives a brief overview of the Am29050 microprocessor architecture, 
grouped into programming-related features, hardware features, and system 
interfaces. The technical information given in this chapter is also contained in 
subsequent chapters. Much of the detail is omitted here, since the objective is to 
provide a framework for understanding the information in later chapters. 

Where appropriate, section titles in this chapter are followed by references to sections 
appearing in subsequent chapters. The referenced sections contain related detailed 
information. 

2.1 PROGRAMMER REFERENCE OVERVIEW 

This section gives a brief description of the Am29050 microprocessor from a program- 
mer's point of view. It introduces the processor's program modes, registers, and 
instructions. An overview of the processor's data formats and handling is given. This 
section also briefly describes interrupts and traps, memory management, and the 
coprocessor interface. Finally, the Timer Facility and Trace Facility are introduced. 

2.1 .1 Program Modes (see Section 3.1 ) 

There are three mutually exclusive modes of program execution: the Supervisor 
mode, the User mode, and the Monitor mode. In the Supervisor mode, executing 
programs have access to all processor resources. In the User mode, certain proces- 
sor resources may not be accessed; any attempted access causes a trap. The Moni- 
tor mode allows debugging of both User and Supervisor code. 

2.1.2 Visible Registers (see Section 3.2) 

The Am29050 microprocessor incorporates four classes of registers which are ac- 
cessed and manipulated by instructions: general-purpose registers, floating-point 
accumulator registers, special-purpose registers, and Translation Look-Aside Buffer 
(TLB) registers. 

2.1 .2.1 GENERAL-PURPOSE REGISTERS (see Section 3.2.1 ) 

The Am29050 microprocessor has 192 general-purpose registers. General-purpose 
registers are not dedicated to any special use, and are available for any appropriate 
program use. 

Most processor instructions are three-address instructions. An instruction specifies 
any three of the 192 registers for use In instruction execution. Normally, two of these 
registers contain source-operands for the instruction, and a third stores the result of 
the instruction. 

The 192 registers are divided into 64 global and 128 local registers. Global registers 
are addressed with absolute register numbers, while local registers are addressed 
relative to an internal Stack Pointer. 
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For fast procedure calling, a portion of a compiler's run-time stack can be mapped 
into the local registers. Statically allocated variables, temporary values, and 
operating-system parameters are kept in the global registers. 

The Stack Pointer for local registers is mapped to Global Register 1 . The Stack 
Pointer is a full 32-blt virtual address for the top of the run-time stack. 

The Condition Code Accumulator Register Is mapped to both Global Register 2 and 
Global Register 3. This register can be used to accumulate into a single condition 
code the Boolean values produced by several operations. The condition code can 
then be used as an operand in further operations, for example, as a control parameter 
for conditional branches. 

The general-purpose registers may be accessed indirectly, with the register number 
specified by the content of a special-purpose register (see below) rather than by an 
instruction field. Three Independent indirect register numbers are contained in three 
separate special-purpose registers. Indirect addressing is accomplished by specifying 
Global Register as an instruction operand or result register. An instruction can 
specify an indirect register access for any or all of the source operands or result. 

General-purpose registers may be partitioned into segments of 16 registers for the 
purpose of access protection. A register in a protected segment may be accessed 
only by a program executing in the Supervisor or Monitor modes. An attempted ac- 
cess (either read or write) by a User-mode program causes a trap to occur. 

2.1.2.2 FLOATING-POBUT ACCUIAULATOR REGISTERS (see Section 3.2.2) 

The Am29050 microprocessor contains four double-precision floating-point accumula- 
tor registers for use with the floating-point multiply-accumulate and multiply-sum 
operations. Instructions are also provided for writing and reading the accumulator 
registers directly. 

2.1.2.3 SPECIAL-PURPOSE REGISTERS (see Section 3.2.3) 

The Am29050 microprocessor contains 39 special-purpose registers. These registers 
provide controls and data for certain processor functions. 

Special-purpose registers are accessed by data movement only. Any special-purpose 
register can be written with the contents of any general-purpose register or a 1 6-bit 
immediate field, and any general-purpose register can be written with the contents of 
any special-purpose register. Operations cannot be performed directly on the 
contents of special-purpose registers. 

Some special-purpose registers are protected, and can be accessed only in the 
Supervisor or Monitor modes. This restriction applies to both read and write accesses. 
An attempt by a User-mode program to access a protected register causes a trap to 
occur. 

The protected special-purpose registers are defined as follows: 

0. VAB: Vector Area Base Address — Defines the beginning of the interrupt/trap 
Vector Area. 

1. OPS: Old Processor Status— Receives a copy of the Current Processor Status 
(see below) when an interrupt or trap is taken. It is later used to restore the 
Current Processor Status on an interrupt return. 

2. CPS: Current Processor Status — Contains control information associated with 
the currently executing process, such as interrupt disables and the Supervisor 
Mode bit. 



ARCHITECTURE HIGHLIGHTS 



3. CFG: Configuration— Contains control information which normally varies only from 
system to system, and usually is set only during system Initialization. 

4. CHA: Channel Address— Contains the address associated with an external 
access, and retains the address if the access does not complete successfully. The 
Channel Address Register, in conjunction with the Channel Data and Channel 
Control registers described below, allow the restarting of unsuccessful external 
accesses. This might be necessary for an access encountering a page fault in a 
demand-paged environment, for example. 

5. CHD: Channel Data — Contains data associated with a store operation, and retains 
the data if the operation does not complete successfully. 

6. CHC: Channel Control — Contains control information associated with a channel 
operation, and retains this information if the operation does not complete 
successfully. 

7. RBP: Register Bank Protect— Restricts access of User-mode programs to 
specified groups of 16 registers. This protects operating-system parameters kept 
in the global registers from corruption by User-mode programs. 

8. TMC: Timer Counter— ^Supports real-time control and other timing-related 
functions. 

9. TMR: Timer Reload— Maintains synchronization of the Timer Counter. It includes 
control bits for the Timer Facility. 

10. PCO: Program Counter 0— Contains the address of the instruction being decoded 
when an interrupt or trap is taken. The processor restarts this instruction upon 
Interrupt return. 

1 1 . PCI : Program Counter 1— Contains the address of the instruction being executed 
when an Interrupt or trap Is taken. The processor restarts this instruction upon 
interrupt return. 

12. PC2: Program Counter 2— Contains the address of the instruction just completed 
when an Interrupt or trap is taken. This address Is provided for information only, 
and does not participate in an interrupt return. 

13. MMU: MMU Configuration — Allows selection of various memory-management 
options, such as page size. 

14. LRU: LRU Recommendation— Simplifies the reload of entries in the Translation 
Look-Aside Buffer (TLB) by providing information on the least-recently used entry 
of the TLB when a TLB miss occurs (see Section 2.1 .6). 

15. RSN: Reason Vector — Contains the vector number of the synchronous trap which 
caused entry into the Monitor mode. 

16. RMAO: Region Mapping Address — Specifies a mapping from a region of virtual 
address space to physical address space; contains the Virtual Base Address 
(VBA) and the corresponding Physical Base Address (PBA) (see Section 3.6.2). 

17. RMCO: Region Mapping Control — Contains control information associated with 
the region mapping specified by the Region Mapping Address Register 0. 

18. RMA1 : Region Mapping Address 1— Specifies a mapping from a region of virtual 
address space to physical address space; contains the Virtual Base Address 
(VBA) and the corresponding Physical Base Address (PBA). 

19. RMC1 : Region Mapping Control 1— Contains control information associated with 
the region mapping specified by the Region Mapping Address Register 1. 

ARCHITECTURE HIGHLIGHTS 2^ 



20. SPCO: Shadow Program Counter — Contains the address of the instruction being 
decoded when the processor enters Monitor mode. The processor restarts this 
instruction upon return from Monitor mode (see Section 3.7). 

21 . SPC1 : Shadow Program Counter 1 — Contains the address of the instruction being 
executed when the processor enters Monitor mode. The processor restarts this 
instruction upon return from Monitor mode. 

22. SPC2: Shadow Program Counter 2— Contains the address of the instruction just 
completed when the processor enters Monitor mode. This address is provided for 
information only, and does not participate in the return from Monitor mode. 

23. IBAO: Instruction Breakpoint Address O—Contains the address of an instruction 
breakpoint (see Section 3.7). 

24. IBCO: Instruction Breakpoint Control — Contains control and status information 
for the breakpoint comparison specified by the Instruction Breakpoint Address 
Register 0. 

25. IBA1 : Instruction Breakpoint Address 1 — Contains the address of an instruction 
breakpoint. 

26. IBC1 : Instruction Breakpoint Control 1— Contains control and status information 
for the breakpoint comparison specified by the Instruction Breakpoint Address 
Register 1 . 

The unprotected special-purpose registers are defined as follows: 

128. IPC: Indirect Pointer C — Allows the indirect access of a general-purpose register. 

129. IPA: Indirect Pointer A— Allows the indirect access of a general-purpose register. 

130. IPB: Indirect Pointer B— Allows the indirect access of a general-purpose register. 

131 . Q: Q-T-Provides additional operand bits for multiply step, divide step, and divide 
operations. 

132. ALU: ALU Status— Contains information about the outcome of integer arithmetic 
and logical operations, and holds residual control for certain instruction 
operations. 

133. BP: Byte Pointer— Contains an index of a byte or half-word within a word. This 
register is also accessible via the ALU Status Register. 

134. FC: Funnel Shift Counf— Provides a bit offset for the extraction of word-length 
fields from double-word operands. This register is also accessible via the ALU 
Status Register. 

135. CR: Load/Store Count Remaining— Maintains a count of the number of loads and 
stores remaining for load-multiple and store-multiple operations. The count is 
Initialized to the total number of loads or stores to be performed before the 
operation is initiated. This register is also accessible via the Channel Control 
Register. 

160. FPE: Floating-Point Environment— Controls the operation of floating-point 
arithmetic, such as rounding modes and exception reporting. 

1 61 . INTE: integer Environment— Enables and disables the reporting of exceptions 
which occur during integer multiply and divide operations. 

162. FPS: Floating-Point Status— Contains information about the outcome of 
floating-point operations. 
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1 64. EXOP: Exception Opcode— Reports the operation code of an instruction causing 
a trap. This register is provided primarily for recovery from floating-point 
exceptions, but is also set for other instructions that cause traps. 

2.1.2.4 TLB REGISTERS (see Section 3.2.4) 

Translation Look-Aside Buffer (TLB) entries in the Am29050 Memory Management 
Unit are accessed via 128 TLB registers. A single TLB entry appears as two TLB 
registers; TLB registers are thus paired according to the corresponding TLB entry. 

TLB registers are accessed by data movement only. Any TLB register can be written 
with the contents of any general-purpose register, and any general-purpose register 
can be written with the contents of any TLB register. Operations cannot be performed 
directly on the contents of TLB registers. 

TLB registers can be accessed only in the Supervisor mode. This restriction applies to 
both read and write accesses. An attempt by a User-mode program to access a TLB 
register causes a trap to occur. 

2.1.3 Instruction Set Overview (see Section 3.3 and Ciiapter 8) 

The three-address architecture of the Am29050 microprocessor instruction set allows 
a compiler or assembly-language programmer to prevent the destruction of operands, 
and aids register allocation and operand reuse. Instruction operands may be 
contained in any two of the 192 general-purpose registers, and instruction results may 
be stored in any of the 192 general-purpose registers. 

The compiler or assembly-language programmer has complete freedom to allocate 
register usage. There is no dedication of a particular register or register group to a 
particular class of operations. The instruction set is designed to minimize the number 
of side effects and implicit operations of instructions. 

Most Am29050 microprocessor instructions can accept an 8-bit constant as one of the 
source operands. Larger constants are constructed using one or two additional 
Instructions and a general-purpose register. Relative branch instructions specify a 
16-bit, signed, word offset. Absolute branches specify a 16-blt word address. 

The Am29050 microprocessor instruction set contains 125 instructions. These 
instructions are divided Into nine classes: 

1 . Integer Arithmetic— Perform Integer add, subtract, multiply, and divide operations. 

2. Compare — Perform arithmetic and logical comparisons. Some instructions in this 
class allow the generation of a trap if the comparison condition is not met. 

3. Logical— Perform a set of bit-wise Boolean operations. 

4. Shift — Perform arithmetic and logical shifts, and allow the extraction of 32-bit 
words from 64-bit double-words. 

5. Data Movement— Perform movement of data fields between registers, and the 
movement of data to and from external devices and memories. 

6. Constant— Allow the generation of large constant values in registers. 

7. Floating-Point— Perform floating-point arithmetic, comparisons, and format 
conversions. 

8. Branch— Perform program jumps and subroutine calls. 

9. Miscellaneous— Perform miscellaneous control functions and operations not 
provided by other classes. 
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Figure 2-1 



The Am29050 microprocessor executes all instructions in a single cycle, except for 
floating-point operations, interrupt returns, Load Multiple, and Store Multiple. 

Table 2-1 lists all Am29050 microprocessor instructions alphabetically by instruction 
mnemonic. Table 2-1 is provided only to give a general overview of the instruction set. 
Section 3.3 defines the instructions grouped into classes, and Chapter 8 provides a 
detailed specification of the instruction set. 
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2.1.4 Data Formats And Handling fsee Section 3.4) 

This section introduces the data formats and data-manipulation mechanisms which 
are supported by the Am29050 microprocessor. 

2.1 .4.1 DATA TYPES (see Sections 3.4.1 , 3.4.2, and 3.4.3) 

A word is defined as 32 bits of data. A half-word consists of 1 6 bits, and a 
double-word consists of 64 bits. Bytes are 8 bits in length. The Am29050 
microprocessor has direct support for single- and double-precision floating-point, 
word-integer (signed and unsigned), word-logical, word-Boolean, half-word integer 
(signed and unsigned), and character (signed and unsigned) data. Other data types, 
such as character strings, are supported with sequences of basic Instructions. 

The format for Boolean data used by the processor is such that the Boolean values 
TRUE and FALSE are represented by 1 and 0, respectively, in the most-significant bit 
of a word. 
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Table 2-1 


Am29050 Microprocessor Instruction Set 




Mnemonic 


Instruction Name 




ADD 


Add 




ADDC 


Add with Carry 




ADDCS 


Add with Carry, Signed 




ADDCU 


Add with Carry, Unsigned 




ADDS 


Add, Signed 




ADDU 


Add, Unsigned 




AND 


AND Logical 




ANDN 


AND-NOT Logical 




ASEQ 


Assert Equal To 




ASGE 


Assert Greater Than or Equal To 




ASGEU 


Assert Greater Than or Equal To, Unsigned 




ASGT 


Assert Greater Than 




ASGTU 


Assert Greater Than, Unsigned 




ASLE 


Assert Less Than or Equal To 




ASLEU 


Assert Less Than or Equal To, Unsigned 




ASLT 


Assert Less Than 




ASLTU 


Assert Less Than, Unsigned 




ASNEQ 


Assert Not Equal To 




CALL 


Call Subroutine 




CALLI 


Call Subroutine, Indirect 




CLASS 


Classify Floating-Point Operand 




CLZ 


Count Leading Zeros 




CONST 


Constant 




CONSTH 


Constant, High 




CONSTHZ 


Constant High, Zero Lower 




CONSTN 


Constant, Negative 




CONVERT 


Convert Data Format 




CPBYTE 


Compare Bytes 




CPEQ 


Compare Equal To 




CPGE 


Compare Greater Than or Equal To 




CPGEU 


Compare Greater Than or Equal To, Unsigned 




CPGT 


Compare Greater Than 




CPGTU 


Compare Greater Than, Unsigned 




CPLE 


Compare Less Than or Equal To 




CPLEU 


Compare Less Than or Equal To, Unsigned 




CPLT 


Compare Less Than 




CPLTU 


Compare Less Than, Unsigned 




CPNEQ 


Compare Not Equal To 




DADD 


Floating-Point Add, Double-Precision 




DDIV 


Floating-Point Divide, Double-Precision 




DEQ 


Floating-Point Equal To, Double-Precision 




DGE 


Floating-Point Greater Than or Equal To, Double-Precision 




DGT 


Floating-Point Greater Than, Double-Precision 




DIV 


Divide Step 




DIVO 


Divide Initialize 




DIVIDE 


Integer Divide, Signed 




DIVIDU 


Integer Divide, Unsigned 




DIVL 


Divide Last Step 




DIVREM 


Divide Remainder 




DMAC 


Floating-Point Multiply-Accumulate, Double-Precision 




DMUL 


Floating-Point Multiply, Double-Precision 




DMSM 


Floating-Point Multiply-Sum, Double-Precision 




DSUB 


Floating-Point Subtract, Double-Precision 




EMULATE 


Trap to Software Emulation Routine 




EXBYTE 


Extract Byte 




EXHW 


Extract Half-word 
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Table 2-1 


Am29050 Microprocessor Instruction Set (continued) 




Mnemonic 


Instruction Name 




EXHWS 


Extract Half-Word, Sign-Extended 




EXTRACT 


Extract Word, Bit-Aligned 




FADD 


Floating-Point Add, Single-Precision 




FDIV 


Floating-Point Divide, Single-Precision 




FDMUL 


Floating-Point Multiply, Single-to-Double Precision 




FEQ 


Floating-Point Equal To, Single-Precision 




FGE 


Floating-Point Greater than or Equal To, Single-Precision 




FGT 


Floating-Point Greater Than, Single-Precision 




FMAC 


Floating-Point Multiply-Accumulate, Single-Precision 




FMUL 


Floating-Point Multiply, Single-Precision 




FMSM 


Floating-Point Multiply-Sum, Single-Precision 




FSUB 


Floating-Point Subtract. Single-Precision 




HALT 


Enter Halt Mode 




INBYTE 


Insert Byte 




INHW 


Insert Half-Word 




INV 


Invalidate 




IRET 


Interrupt Return 




IRETINV 


Interrupt Return and Invalidate 




JMP 


Jump 




JMPF 


Jump False 




JMPFDEC 


Jump False and Decrement 




JMPFI 


Jump False Indirect 




JMPI 


Jump Indirect 




JMPT 


Jump True 




JMPTI 


Jump True Indirect 




LOAD 


Load 




LOADL 


Load and Lock 




LOADM 


Load Multiple 




LOADSET 


Load and Set 




MFACC 


Move from Accumulator 




MFSR 


Move from Special Register 




MFTLB 


Move from Translation Look-Aside Buffer Register 




MTACC 


Move to Accumulator 




MTSR 


Move to Special Register 




MTSRIM 


Move to Special Register Immediate 




MTTLB 


Move to Translation Look-Aside Buffer Register 




MUL 


Multiply Step 




MULL 


Multiply Last Step 




MULTIPLU 


Integer Multiply, Unsigned 




MULTIPLY 


Integer Multiply, Signed 




MULTM 


Integer Multiply Most-Significant Bits, Signed 




MULTMU 


Integer Multiply Most-Significant Bits, Unsigned 




MULU 


Multiply Step, Unsigned 




NAND 


NAND Logical 




NOR 


NOR Logical 




OR 


OR Logical 




ORN 


OR NOT Logical 




SETIP 


Set Indirect Pointers 




SLL 


Shift Left Logical 




SORT 


Floating-Point Square Root 




SRA 


Shift Right Arithmetic 




SRL 


Shift Right Logical 




STORE 


Store 




STOREL 


Store and Lock 




STOREM 


Store Multiple 
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Table 2-1 Am29050 Microprocessor Instruction Set (continued) 



Mnemonic Instruction Name 



SUB Subtract 

SUBC Subtract with Carry 

SUBCS Subtract with Carry, Signed 

SUBCU Subtract with Carry, Unsigned 

SUBR Subtract Reverse 

SUBRC Subtract Reverse with Carry 

SUBRCS Subtract Reverse with Carry, Signed 

SUBRCU Subtract Reverse with Carry, Unsigned 

SUBRS Subtract Reverse, Signed 

SUBRU Subtract Reverse, Unsigned 

SUBS Subtract Signed 

SUBU Subtract Unsigned 

XNOR Exclusive-NOR Logical 

XOR Exclusive-OR Logical 



Figure 2-1 illustrates the numbering conventions for data units contained in a word. 
Within a word, bits are numbered In increasing order from right-to-left, starting with 
the number for the least-significant bit. Bytes and half-words within a word are 
numbered in Increasing order starting with the number 0. However, bytes and half- 
words may be numbered right-to-left (sometimes referred to as "little endlan") or 
left-to-right (sometimes referred to as "big endlan"), as controlled by the Configuration 
Register. 

Note that the numbering of bits within words Is strictly for notatlonal convenience. In 
contrast, the numbering conventions for bytes and half-words within words affect 
processor operations. 

2.1 .4.2 EXTERNAL DATA ACCESSES (see Section 3.4.4) 

External accesses move data between the processor and external devices and 
memories. These accesses occur only as a result of load and store Instructions. 

Load and store instructions move words of data to and from general-purpose 
registers. Each load and store Instruction moves a single word. There are load and 
store instructions which support Interlocking operations necessary for multi-processor 
exclusion, synchronization, and communication. 

For the movement of multiple words, Load Multiple and Store Multiple Instructions 
move the contents of sequentially addressed external locations to or from sequentially 
numbered general-purpose registers. The Load Multiple and Store Multiple allow the 
movement of up to 192 words at a maximum rate of one word per processor cycle. 
The multiple load and store sequences can be Interrupted, and restarted at the point 
of interruption. 

Load and store Instructions provide no mechanism for computing the address 
associated with the external data access. All addresses are contained In a 
general-purpose register at the beginning of the access, or are given by an 8-bit 
Instruction constant. Any address computation must be performed explicitly before the 
load or store Instruction is executed. Since address computations are expressed 
directly, they are exposed for compiler optimizations as any other computations are. 
Processor hardware tracks the registers that are being used to contain addresses, 
and tracks computations that are for external addresses. This Information allows the 
processor to reduce the apparent external access time by one cycle In many cases. 
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External data accesses are overlapped with instruction execution. Processor perform- 
ance is improved if instructions that follow loads do not immediately use externally 
referenced data. In this manner, the time required to perform the external access is 
overlapped with subsequent instruction execution. Because of hardware interlocks, 
this concurrency has no effect on the logical behavior of an executing program. 

2.1.4.3 ADDRESSING AND ALIGNMENT (see Section 3.4.5) 

External instructions and data are contained in one of five 32-bit address spaces: 

1 . Data Memory 

2. Input/Output 

3. Coprocessor 

4. Instruction Read-Only Memory (Instruction ROM) 

5. Instruction Random Access Memory (Instruction RAM) 

An address is treated as virtual or physical, as determined by the Current Processor 
Status Register. Address translation for data accesses is enabled separately from 
address translation for instruction accesses. A program in the Supervisor mode can 
temporarily disable address translation for individual loads and stores; this permits 
load-real and store-real operations. 

Bits contained within load and store instructions distinguish between the data 
memory, input/output, and coprocessor address-spaces. Address translation also 
may determine whether an access is performed in the data memory or the input/ 
output address space. The Current Processor Status register determines whether 
instruction accesses are directed to the instruction/RAM memory address space or 
to the instruction ROM address space. 

The Am29050 microprocessor does not support data accesses directly to the 
instruction RAM or instruction ROM address space. However, this capability is 
possible as a system option. 

All addresses are interpreted as byte addresses, although accesses are word- 
oriented. The number of a byte within a word is given by the two least-significant 
address bits. The number of a half-word within a word is given by the 
next-to-least-signlficant address bit. 

Since only byte addressing is supported, it is possible that an address for the access 
of a word or half-word is not aligned to the desired word or half-word. For a word 
access, an unaligned address has a 1 in either or both of the two least-significant 
address bits. For a half-word access, an unaligned address has a 1 in the 
least-significant address bit. In many systems, address alignment can be ignored, 
with addresses truncated to access the word or half-word of interest. However, as a 
user option, the Am29050 microprocessor can create a trap when a non-aligned 
access is attempted. The trap allows software emulation of non-aligned accesses. 

In the Am29050 microprocessor, all Instructions are 32 bits in length, and are aligned 
on word-address boundaries. 

2.1 .4.4 BYTE AND HALF-WORD ACCESSES (see Section 3.4.6) 

The Am29050 microprocessor supports the direct external access of bytes and 
half-words as an option. If this option is enabled, the Am29050 microprocessor 
selects a byte or half-word within a word on a load, and aligns it to the low-order byte 
or half-word of a register. On a store, the low-order byte or half-word of a register is 
replicated in all byte or half-word positions, so that the external memory can easily 
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write the required byte or half-word in memory. This option requires that the external 
memory system be able to write individual bytes and half-words within words. 

To avoid the memory-system complexity required for writing Individual bytes and 
half-words, the Am29050 microprocessor can perform byte and half-word accesses 
using software alone. The Am29050 microprocessor can set a byte-position indicator 
in the ALU Status Register, as an option for load instructions, with the two least-sig- 
nificant bits of the address for the load. To load a byte or half-word, a word load is first 
performed. This load sets the byte-position indicator, and a subsequent Instruction 
extracts the byte or half-word of interest from the accessed word. To store a byte or 
half-word, a load is also first performed; the byte or half-word of interest is inserted 
into the accessed word, and the resulting word then Is stored. Even if the Am29050 
microprocessor is configured to perform byte and half-word accesses in hardware, 
this software-only technique operates correctly; this allows software to be upward- 
compatible from simpler systems to more complex systems. 

2.1 .5 Interrupts And Traps (see Section 3.5) 

Normal program flow may be preempted by an interrupt or trap for which the 
processor is enabled. The effect on the processor is identical for interrupts and traps; 
the distinction is in the different mechanisms by which interrupts and traps are 
enabled. It is Intended that interrupts be used for suspending current program 
execution and causing another program to execute, while traps are used to report 
errors and exceptional conditions. 

The interrupt and trap mechanism supports high-speed, temporary context switching 
and user-defined interrupt-processing mechanisms. 

2.1 .5.1 TEMPORARY CONTEXT SWITCHING 

The basic interrupt/trap mechanism of the Am29050 microprocessor supports 
temporary context switching. During the temporary context switch, the interrupted 
context Is held in processor registers. The interrupt or trap handler can return 
Immediately to this context. 

Temporary context switching is useful for instruction emulation, TLB reload routines, 
and so forth. Many of its features are similar to microprogram execution: processor 
context does not have to be saved; Interrupts are disabled for the duration of the 
program; and all processor resources are accessible, even if the context that was 
interrupted Is in the User mode. The associated routine may execute from instruction 
RAM memory or instruction ROM. 

2.1 .5.2 USER-DEFINED INTERRUPT PROCESSING 

Since the basic interrupt/trap mechanism for the Am29050 microprocessor keeps the 
interrupted context in the processor, dynamically nested interrupts are not supported 
directly. The context in the processor must be saved before another interrupt or trap 
can be taken. 

The interrupt or trap handler executing during a temporary context switch is not 
required to return to the interrupted context. This routine optionally may save the 
Interrupted context, load a new one, and return to the new context. 

The Implementation of the saving and restoring of contexts is completely user-de- 
fined. Thus, the context save/restore mechanism used (e.g., interrupt stack, program 
status word area, etc.) and the amount of context saved can be tailored to the needs 
of the system. 
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2.1 .5.3 VECTOR AREA (see Section 3.5.4) 

Interrupt and trap dispatching occurs through a relocatable Vector Area which 
accommodates as many as 256 Interrupt and trap handling routines. Entries into the 
Vector Area are associated with various sources of interrupts and traps; some are 
pre-defined, while others are user-defined. 

The Vector Area Is either a table of vectors in data memory, where each vector points 
to the beginning of an Interrupt or trap handler, or it Is a segment of instruction/data 
memory (or Instruction ROM) containing the actual routines. The latter configuration 
for the Vector Area yields better Interrupt performance at the cost of additional 
memory. 

2.1 .6 Memory Management (see Section 3.6) 

The Am29050 microprocessor Incorporates a Memory Management Unit (MMU) that 
accepts a 32-blt virtual byte-address and translates it to a 32-bit physical byte-ad- 
dress In a single cycle. Address translation In the MMU Is performed either by a 
64-entry Translation Look-Aside Buffer (TLB) or by one of two Region Mapping Units 
(RMU). The MMU Is not dedicated to any particular address-translation architecture. 

2.1.6.1 TRANSLATION LOOK-ASIDE BUFFER 

The TLB Is an associative table which contains the most-recently used address 
translations for the processor. If the translation for a given address cannot be 
performed by the TLB, a TLB miss occurs, and causes a trap which allows the 
required translation to be placed Into the TLB. 

Processor hardware maintains information for each TLB line Indicating which entry 
was least recently used; when a TLB miss occurs, this information is used to indicate 
the TLB entry to be replaced. Software Is responsible for searching system page 
tables and modifying the indicated TLB entry as appropriate. This allows the page 
tables to be defined according to the system environment. 

TLB entries are modified directly by processor instructions. A TLB entry consists of 64 
bits and appears as two word-length TLB registers which may be inspected and 
modified by instructions. 

TLB entries are tagged with a Task Identifier field, which allows the operating system 
to create a unique 32-bit virtual address space for each of 256 processes. In addition, 
TLB entries provide support for memory protection and user-defined control 
Information. 

2.1.6.2 REGION MAPPING UNITS 

In addition to the page-by-page translation provided by the TLB, the Am29050 
microprocessor supports translation for variable-sized regions, ranging from 64 kb to 
2 Gb, by means of two Region Mapping Units. 

Each RMU consists of two special-purpose registers. One of the registers in each 
RMU contains the base address of the virtual region to be mapped and the base 
address of the corresponding physical region. The other register specifies the region 
size and contains Information which is used to control access, including a Task 
Identifier. 

The RMUs have priority over the TLB translation; In addition, RMUO has priority over 
RMU1. 
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2.1 .7 Coprocessor Programmmg (see Section 6.1 ) 

The coprocessor interface for the Am29050 microprocessor allows a program to 
communicate with an off-chip coprocessor for performing operations not supported by 
processor hardware directly. 

The coprocessor interface allows the program to transfer operands and operation 
codes to the coprocessor, and then perform other operations while the coprocessor 
operation is in progress. The results of the operation are read from the coprocessor 
by a separate transfer. The processor may transfer multiple operands to the 
coprocessor without re-transferring operation codes or reading intermediate results. 
As many as 64 bits of information can be transferred to the coprocessor in a single 
cycle. 

The Am29050 microprocessor includes features that support the definition of the 
coprocessor as a system option. In this case, coprocessor operations are emulated 
by software when the coprocessor is not present in a system. 

2.1.8 Timer Faciiitif (see Section 7.3.6) 

The Timer Facility provides a counter for implementing a real-time clock or other 
software timing functions. This facility is comprised of two special-purpose registers: 
the Timer Counter Register, which decrements at a rate equal to the processor oper- 
ating frequency, and the Timer Reload Register, which re-initializes the Timer Counter 
Register when it decrements to zero. The Timer Facility optionally may create an 
Interrupt when the Timer Counter decrements to zero. 

2.1.9 Trace Facility (see Section 3.7) 

The Trace Facility allows a debug program to emulate single-Instruction stepping in a 
program under test. This facility allows a trap to be generated after the execution of 
any instruction in the program being tested. 

Using the Trace Facility, the debug program can inspect and modify the state of the 
program at every instruction boundary. The Trace Facility is designed to work prop- 
erly in the presence of normal system interrupts and traps. 

2.2 HARDWARE OVERVIEW 

This section briefly describes the operation of Am29050 microprocessor hardware. It 
introduces the processor pipeline and the three major internal functional units: the 
Instruction Fetch Unit, the Execution Unit, and the Memory Management Unit. Finally, 
the processor's operational modes are described. 

Figure 2-2 shows the Am29050 microprocessor internal data-flow organization. The 
following sections refer to the various components on this data-flow diagram. 

2.2.1 Four-Stage Pipeline (see Section 4.1) 

The Am29050 microprocessor implements a four-stage pipeline for integer instruction 
execution. The four stages are: fetch, decode, execute, and write-back. The pipeline 
is organized so that the effective instruction-execution rate is as high as one 
instruction per cycle. Data forwarding and pipeline interlocks are handled by 
processor hardware. 
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Figure 2-2 
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The execute stage of Am29050 microprocessor floating-point operations is further 
pipelined to a depth determined by the latency of the operation. The Am29050 micro- 
processor can therefore issue most floating-point operations at a rate of one operation 
per cycle, though most operations take more than one cycle to complete. 

2.2.2 Instruction Fetch Unit (see Section 4.2) 

The Instruction Fetch Unit fetches instructions, and supplies instructions to other 
functional units. It incorporates the Instruction Prefetch Buffer, the Branch Target 
Cache memory, and the Program Counter Unit. All components of the Instruction 
Fetch Unit operate during the fetch stage of the processor pipeline. 

2.2.2.1 INSTRUCTION PREFETCH BUFFER (see Section 4.2.1) 

Most instructions executed by the Am29050 microprocessor are fetched from external 
Instruction memory. The processor prefetches instructions so that they are requested 
at least four cycles before they are required for execution. 

Prefetched instructions are stored In a four-word Instruction Prefetch Buffer while 
awaiting execution. An instruction-prefetch request occurs whenever there is a free 
location in this buffer (if the processor is otherwise enabled to fetch instructions). 
When a non-sequential instruction fetch occurs, prefetching is terminated, and then 
restarted for the new instruction stream. 

Instruction prefetching de-couples the instruction-fetch rate from the instruction-ac- 
cess latency. For example, an instruction may be transferred to the processor two 
cycles after it is requested. However, as long as instructions are supplied to the proc- 
essor at an average rate of one instruction per cycle, this latency has no effect on the 
instruction-execution rate. 

2.2.2.2 Branch Target Cache Memory (see Section 4.2.2) 

The Am29050 microprocessor Incorporates a Branch Target Cache memory which 
contains as many as 256 instructions. The Branch Target Cache memory is a 
two-way, set-associative cache containing the first target instructions of a number of 
recently taken branches. The Branch Target Cache memory can be configured, under 
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software control, to cache either two instructions for each branch or four instructions. 
Each of the two sets In the Branch Target Cache memory contains 128 Instructions, 
and the 128 instructions are further divided either into 32 blocks of four instructions 
each or into 64 blocks of two Instructions each. 

The purpose of the Branch Target Cache memory Is to provide instructions for the 
beginning of a non-sequential Instruction-fetch sequence. This keeps the instruction 
pipeline full until the processor can establish a new instruction-prefetch stream from 
the external Instruction memory. 

The processor Is organized so that branch Instructions can execute In a single cycle if 
the target Instruction sequence is present In the Branch Target Cache memory. 

2.2.2.3 PROGRAM COUNTER UNIT (see Section 4.2.4) 

The Program Counter Unit creates and sequences addresses of instructions as they 
are executed by the processor. 

2.2.3 Execution Unit (see Section 4.3) 

The Execution Unit executes Instructions. It incorporates the Register File, the Ad- 
dress Unit, the Arithmetic/Logic Unit, the Field Shift Unit, the Floating-Point Unit, and 
the Prioritizer. The Register File and Address Unit operate during the decode stage of 
the pipeline. The Arithmetic/Logic Unit, Field Shift Unit, Floating-Point Unit, and 
Prioritizer operate during the execute stage of the pipeline. The Register File also 
operates during the write-back stage. 

2.2.3.1 REGISTER FILE (see Section 4.3.1) 

The general-purpose registers are implemented by a 192-location Register File. The 
Register File can perform two 64-blt read accesses and two write accesses in a single 
cycle. Normally, two read accesses are performed during the decode-plpeline stage 
to fetch operands required by the instruction being decoded. One write access during 
the same cycle completes the write-back stage of a previously executed Integer 
instruction, and a second write access completes the write-back stage of a previously 
executed floating-point operation. The write port for integer results Is 32 bits wide, and 
the write port for floating-point results Is 64 bits wide. 

Addressing logic associated with the Register File distinguishes between the global 
and local general-purpose registers, and It performs the Stack-Pointer addressing for 
the local registers. Register File addressing functions are performed during the de- 
code stage. 

2.2.3.2 ADDRESS UNIT (see Section 4.3.2) 

The Address Unit evaluates addresses for branches, loads, and stores. It also assem- 
bles Instruction-Immediate data and computes addresses for load-multiple and store- 
multiple sequences. 

2.2.3.3 ARITHlVIETIC/LOGiC UNIT (see Section 4.3.4) 

The ALU performs all logical, compare, and Integer arithmetic operations (including 
multiply step and divide step). 

2.2.3.4 FIELD SHIFT UNIT (see Section 4.3.5) 

The Field Shift Unit performs N-bIt shifts. The Field Shift Unit also performs byte and 
half-word extract and Insert operations, and It extracts words from double-words. 
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2.2.3.5 FLOATING-POINT UNIT (see Section 4.3.7) 

The Floating-Point Unit performs single- and double-precision floating-point opera- 
tions in accordance with the IEEE Standard for Binary Floating-Point Arithmetic 
(ANSI/IEEE Std 754-1985). 

2.2.3.6 PRIORITIZER (see Section 4.3.6) 

The Prioritlzer provides a count of the number of leading zero bits in a 32-blt word; 
this is useful for performing prioritization in a multi-level interrupt handler, for example. 

2.2.4 Memory Management Unit (see Section 4.4) 

The Memory Management Unit (MMU) performs address translation and memory- 
protection functions for all branches, loads, and stores. The MMU operates during the 
execute stage of the pipeline, so the physical address that it generates is available at 
the beginning of the write-back stage. 

All addresses for external accesses are physical addresses. MMU operation is 
pipelined with external accesses, so that an address translation can occur while a 
previous access completes. 

Address translation is not performed for the addresses associated with instruction 
prefetching. Instead, these addresses are generated by an instruction prefetch pointer 
which is incremented by the processor. Address translation is performed only at the 
beginning of the prefetch sequence (as the result of a branch instruction), and when 
the prefetch pointer crosses a potential virtual-page boundary. 

2.2.5 Processor Modes 

The Am29050 microprocessor operates in several different modes to accomplish 
various processor and system functions. All modes except for Pipeline Hold (see 
below) are under direct control of instructions and/or processor control inputs. The 
Pipeline Hold mode normally is determined by the relative timing between the proces- 
sor and its external system for certain types of operations. The processor provides an 
external indication of its operational mode. 

2.2.5.1 EXECUTING 

When the processor is in the Executing mode, it fetches and executes instructions as 
described in this manual. External accesses occur as required. 

2.2.5.2 WAIT (see Section 3.5.3) 

When the processor is in the Wait mode, it does not execute instructions, and per- 
forms no external accesses. The Wait mode is controlled by the Current Processor 
Status Register. The processor leaves this mode when an interrupt or trap for which it 
is enabled occurs, or when a reset occurs. 

2.2.5.3 PIPELINE HOLD (see Section 4.5) 

Under certain conditions, processor pipelining might cause non-sequential instruction 
execution or timing-dependent results of execution. For example, the processor might 
attempt to execute an instruction that has not been fetched from instruction/data 
memory. 

For such cases, pipeline-interlock hardware detects the anomalous condition and 
suspends processor execution until execution can proceed properly. While execution 
is suspended by the interlock hardware, the processor is in the Pipeline Hold mode. 
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The processor resumes execution when the pipeline-interlock hardware determines 
that it Is correct to do so. 

2.2.5.4 HALT (see Section 5.3.3) 

The Halt mode is provided so that the processor may be placed under the control of a 
hardware-development system (see Section 2.3.2) for the purposes of hardware and 
software debug. The processor enters the Halt mode as the result of instruction exe- 
cution, or as the result of external controls. In the Halt mode, the processor neither 
fetches nor executes instructions. 

2.2.5.5 STEP (see Section 5.3.3) 

The Step mode allows a hardware-development system to step through processor 
pipeline operation on a stage-by-stage basis. The Step mode nearly is Identical to the 
Halt mode, except that it enables the processor to enter the Executing mode while the 
pipeline advances by one stage. 

2.2.5.6 LOAD TEST INSTRUCTION (see Section 5.3.3) 

The Load Test Instruction mode permits a hardware-development system to access 
data contained in the processor or system. This is accomplished by allowing a hard- 
ware-development system to supply the processor with instructions, instead of having 
the processor fetch instructions from instruction memory. The Load Test Instruction 
mode is defined so that, once the processor has completed the execution of instruc- 
tions provided by the hardware-development system, it may resume the execution of 
its normal instruction sequence. 

2.2.5.7 TEST (see Section 5.3.4) 

The Test mode facilitates testing of hardware associated with the processor by dis- 
abling processor outputs so that they may be driven directly by test hardware. The 
Test mode also allows the addition of a second processor to a system, to monitor the 
outputs of the first and signal detected errors. 

2.2.5.8 RESET (see Section 3.9 and Section 5.5) 

The Reset mode provides initialization of certain processor registers and control state. 
This is used for power-on reset, for eliminating unrecoverable error conditions, and for 
supporting certain hardware-debug functions. 

2.3 SYSTEM INTERFACE OVERVIEW 

This section briefly describes the features of the Am29050 microprocessor that allow 
it to be connected to other system components. 

The two major interfaces of the Am29050 microprocessor, introduced in this section, 
are the channel and the Test/Development interface. The other topics briefly 
described here are clock generation, master/slave checking, and coprocessor 
attachment. 

Section 5.1 contains a complete pin description of the Am29050 microprocessor. 
Appendix A contains timing diagrams and related information. 

2.3.1 Channel (see Section 5.2) 

The Am29050 microprocessor channel consists of the following 32-bit buses and 
related controls: 

1. An Instruction Bus, which transfers instructions into the processor. 
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2. A Data Bus, which transfers data to and from the processor. 

3. An Address Bus, which provides addresses for both instruction and data 
accesses. The Address Bus also Is used to transfer data to a coprocessor. 

The channel performs accesses and data transfers to all external devices and 
memories. Including Instruction and data memories, instruction caches, data caches, 
input/output devices, bus converters, and coprocessors. 

The channel defines three different access protocols: simple, pipelined, and 
burst-mode. For simple accesses, the Am29050 microprocessor holds the address 
valid throughout the entire access. This Is appropriate for high-speed devices that can 
complete an access in one cycle, and for low-cost devices that are accessed 
infrequently (such as read-only memories containing initialization routines). Pipelined 
and burst-mode accesses provide high performance with other types of devices and 
memories. 

For pipelined accesses, the address transfer is decoupled from the corresponding 
data or instruction transfer. After transmitting an address for a request, the processor 
may transmit one more address before receiving the reply to the first request. This 
allows address transfer and decoding to be overlapped with another access. 

On the other hand, burst-mode accesses eliminate the address-transfer cycle 
completely. Burst-mode accesses are defined so that once an address is transferred 
for a given access, subsequent accesses to sequentially increasing addresses may 
occur without re-transfer of the address. The burst may be terminated at any time by 
either the processor or responding device. 

The Am29050 microprocessor determines whether an access Is simple, pipelined or 
burst-mode on a transfer-by-transfer (i.e., generally devlce-by-device) basis. 
However, an access that begins as a simple access may be converted to a pipelined 
or burst-mode access at any time during the transfer. This relaxes the timing 
constraints on the channel-protocol Implementation, since addressed devices do not 
have to respond immediately to a pipelined or burst-mode request. 

Except for the shared Address Bus, the channel maintains a strict division between 
instruction and data accesses. In the most common situation, the system supplies the 
processor with instructions using burst-mode accesses, with Instruction addresses 
transmitted to the system only when a branch occurs. Data accesses can occur 
simultaneously without Interfering with instruction transfer. 

The Am29050 microprocessor contains arbitration logic to support other masters on 
the channel. A single external master can arbitrate directly for the channel, while 
multiple masters may arbitrate using a daisy chain or other method that requires no 
additional arbitration logic. However, to increase arbitration performance In a multiple- 
master configuration, an external channel arbiter should be used. This arbiter works in 
conjunction with the processor's arbitration logic. 

2.3.2 Test/Development Interface (see Section 5.3) 

The Am29050 microprocessor supports the attachment of a hardware-development 
system such as an in-clrcuit emulator. This attachment is made directly to the 
processor in the system under development, without the removal of the processor 
from the system. The Test/Development Interface makes It possible for the 
hardware-development system to gain control over the Am29050 microprocessor, and 
Inspect and modify its Internal state (e.g., general-purpose register contents, TLB 
entries, etc.). In addition, the Am29050 microprocessor can be used to access other 
system devices and memories on behalf of the hardware-development system. 
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The Test/Development Interface is comprised of controls and status signals provided 
on the Am29050 microprocessor, as well as the Instruction and Data buses. The Halt, 
Step, Reset, and Load Test Instruction modes allow the hardware-development sys- 
tem to control the operation of the Am29050 microprocessor. The hardware-develop- 
ment system may supply the processor with instructions on the Instruction Bus using 
the Load Test Instruction mode. Internal processor state can be inspected and modi- 
fied via the Data Bus. 

2.3.3 Clocks (see Section 5.7) 

The Am29050 microprocessor generates and distributes a system clock at its 
operating frequency. This clock is specially designed to reduce skews between the 
system clock and the processor's internal clocks. The internal clock-generation 
circuitry requires a single-phase oscillator signal at twice the processor operating 
frequency. 

For systems in which processor-generated clocks are not appropriate, the Am29050 
microprocessor also can accept a clock from an external clock generator. 

The processor decides between these two clocking arrangements based on whether 
the power supply to the clock-output driver (PWRCLK) is tied to +5 volts or to 
GROUND. 

2.3.4 Master/Slave Operation (see Section 5.8) 

Each Am29050 microprocessor output has associated logic that compares the signal 
on the output with the signal that the processor is providing internally to the output 
driver. The processor signals situations where the output of any enabled driver does 
not agree with its input. 

For a single processor, the output comparison detects short circuits in output signals, 
but does not detect open circuits. It is possible to connect a second processor in 
parallel with the first, where the second processor has its outputs disabled due to the 
Test mode. The second processor detects open-circuit signals, as well as provides a 
check of the outputs of the first processor. 

2.3.5 Coprocessor Attacliment (see Section 6.2) 

A coprocessor for the Am29050 microprocessor attaches directly to the processor 
channel. However, this attachment has features that are different than those of other 
channel devices. The coprocessor interface is designed to support a high 
operand-transfer rate and to support the overlap of coprocessor operations with other 
processor operations, including other external accesses. 

The coprocessor is assigned a special address space on the channel. This permits 
the transfer of operands and other Information on the Address Bus without interfering 
with normal addressing functions. Since both the Address Bus and Data Bus are used 
for data transfer, the Am29050 microprocessor can transfer 64 bits of information to 
the coprocessor in one cycle. 
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This chapter contains a formal description of the Am29050 microprocessor architec- 
ture. It concentrates on the features of the Am29050 microprocessor and their logical 
behavior. Chapter 7 discusses the use of some of these features. 



3.1 PROGRAM MODES 



At any given time, the Am29050 microprocessor operates in one of three mutually 
exclusive program modes: the Supervisor mode, the User mode, or the Monitor 
mode. The Supervisor and User modes are for normal program execution; all system- 
protection features of the Am29050 microprocessor are based on the difference 
between these two modes. The Monitor mode is used for debugging. 



3.1.1 Supervisor Mode 



Unless it has been forced into the Monitor mode (see Section 3.7), the processor 
operates in the Supervisor mode whenever the Supervisor Mode (SM) bit of the Cur- 
rent Processor Status Register is 1 (see Section 3.2.3). In the Supervisor mode, 
executing programs have access to all processor resources. Virtual regions or pages 
mapped by the Memory Management Unit, however, are protected from Supervisor 
access (read, write, or execute) when the appropriate bit (SR, SW, or SE, respec- 
tively) In the corresponding TLB Entry or Region Mapping Control register is (see 
Section 3.6.2). 

During the address cycle of a channel request, the Supervisor mode is indicated by 
the SUP/US output being High. 



3.1.2 User Mode 



Unless it has been forced into the Monitor mode (see Section 3.7), the processor 
operates in the User mode whenever the SM bit in the Current Processor Status 
Register is 0. In the User mode, any of the following actions by an executing program 
causes a Protection Violation trap to occur: 

1 . An attempted access of any TLB entry (see Section 3.2.4). 

2. An attempted access of any general-purpose register for which a bit in the 
Register Bank Protect Register is 1 (see Section 3.2.1). 

3. An attempted execution of a load or store instruction for which the PA bit is 1 , or 
for which the UA bit is 1 (see Section 3.4.4). (The attempted execution of a 
translated load or store for which the AS bit is 1 also causes a Protection Violation 
trap. However, this trap occurs regardless of whether or not the processor is In the 
User mode.) 

4. An attempted execution of one of the following Instructions: Interrupt Return, 
Interrupt Return and Invalidate, Invalidate, or Halt. However, a hardware- 
development system can disable protection checking for the Halt Instruction, so 
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that this instruction may be used to implement instruction breakpoints in 
User-mode programs (see Section 5.3.3). 

5. An attempted access of one of the following registers: SRO-127, SR1 65-255 (see 
Section 3.2.3). 

6. An attempted execution of an assert or Emulate instruction which specifies a 
vector number between and 63, Inclusive (see Section 3.5.4). 

7. An attempted access (read, write, or execute) in a virtual region or page mapped 
by the Memory Management Unit, when the appropriate permission bit (UR, UW, 
or UE, respectively) in the corresponding TLB Entry or Region Mapping Control 
register is (see Section 3.6.2). 

Devices and memories on the channel also can implement protection and generate 
traps based on the value of the SM bit. During the address cycle of a channel request, 
the User mode is indicated by the SUP/US output being Low. 



3.1 .3 Monitor Mode 



The Monitor mode allows debugging of both Supervisor and User code (see Section 
3.7). The processor enters the Monitor mode whenever the DA bit in the Current 
Processor Status register is 1 , and either a valid breakpoint comparison or a trap 
occurs (except for a trap caused by TRAP(1-0)). 

Upon entry into the Monitor mode, the read-only MM bit in the CPS register is set to 1 , 
and, if entry was caused by a trap, the Reason Vector register Is set to the trap vector 
number. Otherwise, the processor state is not modified. The values in the Shadow 
Program Counter registers are frozen. 

Executing an IRET instruction causes the processor to leave the Monitor mode. The 
processor resumes operation at the instruction addresses contained in the Shadow 
Program Counter registers. 

The Monitor mode can also be used by an external hardware debugger (see 
Section 5.3). 



3.2 VISIBLE REGISTERS 



The Am29050 microprocessor has four classes of registers that are accessible by 
instructions. These are general-purpose registers, floating-point accumulator regis- 
ters, special-purpose registers, and Translation Look-Aside Buffer (TLB) registers. 
Any operation available in the Am29050 microprocessor can be performed on the 
general-purpose registers, while only the floating-point multiply-accumulate and multi- 
ply-sum operations use the floating-point accumulator registers. Special-purpose 
registers and TLB registers are accessed only by explicit data movement to or from 
general-purpose registers. Various protection mechanisms prevent the access of 
some of these registers by User-mode programs. 

A summary of the information in this section appears in Appendix B. 
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3.2.1 General-Purpose Registers 

The Am29050 microprocessor incorporates 192 general-purpose registers. The or- 
ganization of the general-purpose registers is diagrammed in Figure 3-1. 



Figure 3-1 
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General-purpose registers hold the following types Qf operands for program use. 

1. 32-bit data addresses 

2. 32-bit signed or unsigned integers 

3. 32-bit branch-target addresses 

4. 32-bit logical bit strings 

5. 8-bit signed or unsigned characters 

6. 16-bit signed or unsigned integers 

7. Word-length Booleans 

8. Single-precision floating-point numbers 

9. Double-precision floating-point numbers (in two register locations) 

Because a large number of general-purpose registers are provided, a large amount of 
frequently used data can be kept on-chip, where access time is fastest. 

Am29050 microprocessor instructions can specify two general-purpose registers for 
source operands, and one general-purpose register for storing the instruction result. 
These registers are specified by three 8-bit instruction fields containing register num- 
bers. A register may be specified directly by the instruction, or indirectly by one of 
three special-purpose registers. 

3.2.1 .1 REQISTER ADDRESSING 

The general-purpose registers are partitioned into 64 global registers and 128 local 
registers, differentiated by the most-significant bit of the register number. The 
distinction between global and local registers is the result of register-addressing 
considerations. 

The following terminology is used to describe the addressing of general-purpose 
registers: 

1 . Register number— this is a software-level number for a general-purpose register. 
For example, this is the number contained in an Instruction field. Register 
numbers range from to 255. 

2. Global-register number-— this is a software-level number for a global register. 
Global-register numbers range from to 127. 

3. Local-register number — ^this is a software-level number for a local register. 
Local-register numbers range from to 127. 

4. Absolute-register number— this is a hardware-level number used to select a 
general-purpose register in the Register File. Absolute-register numbers range 
from to 255. 

3.2. 1 .2 GLOBAL REGISTERS 

When the most-significant bit of a register number is 0, a global register is selected. 
The seven least-significant bits of the register number give the global-register num- 
ber. For global registers, the absolute-register number is equivalent to the register 
number. 

Global registers 4 through 63 are not implemented. An attempt to access these regis- 
ters yields unpredictable results; however, they may be protected from User-mode 
access by the Register Bank Protect Register (see below). 

The register numbers associated with Global Registers 0, 1 , 2, and 3 have special 
meaning. The number for Global Register specifies that an indirect pointer is to be 
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used as the source of the register number; there is an indirect pointer for each of the 
instruction operand/result registers. Global Register 1 contains the Stack Pointer, 
which is used in the addressing of local registers. The Condition Code Accumulator, 
which Is used to concatenate Boolean results from one or more operations into a 
single condition code, is accessed through Global Registers 2 and 3. 

3.2.1.3 LOCAL-REGISTER STACK POINTER 

The Stack Pointer is a 32-bit register that may be an operand of an instruction as any 
other general-purpose register. However, a shadow copy of Global Register 1 is 
maintained by processor hardware to be used in local-register addressing. This 
shadow copy is set only with the results of Arithmetic and Logical instructions. If the 
Stack Pointer is set with the result of any other instruction class, local registers cannot 
be accessed predictably until the Stack Pointer is set once again with an Arithmetic or 
Logical instruction. 

A modification of the Stack Pointer has a delayed effect on the addressing of local 
registers, as discussed in Section 7.4.3. 

3.2.1 .4 CONDITION CODE ACCUMULATOR REGISTER 

The Condition Code Accumulator Register is accessed through Global Registers 2 
and 3. If Global Register 2 (CCA) is specified as the destination of an operation, then 
the 32-bit operation result is written to the Condition Code Accumulator Register. If 
Global Register 3 (CCA-shift) is the destination, then the Condition Code Accumulator 
Register is shifted left one bit and the most-significant bit of the operation result is 
placed In the least-significant bit of the register. The Condition Code Accumulator 
Register contents can be read by specifying Global Register 2 as a source operand of 
an operation. In this way, the Condition Code Accumulator Register can concatenate 
the Boolean results of several operations into a single condition code, which can then 
be used In subsequent operations. 

3.2.1 .5 LOCAL REGISTERS 

When the most-significant bit of a register number is 1 , a local register is selected. 
The seven least-significant bits of the register number give the local-register number. 
For local registers, the absolute-register number is obtained by adding the local-regis- 
ter number to bits 8-2 of the Stack Pointer and truncating the result to seven bits; the 
most-significant bit of the original register number Is unchanged (i.e., it remains a 1). 

The Stack Pointer addition applied to local-register numbers provides a limited form of 
base-plus-offset addressing within the local registers. The Stack Pointer contains the 
32-bit base address. This assists run-time storage management of variables for dy- 
namically nested procedures (see Section 7.1). 

3.2.1 .6 REGISTER BANKING 

For the purpose of access restriction, the general-purpose registers are divided into 
register banks. Register banks consist of 1 6 registers (except for Bank 0, which con- 
tains registers 4 through 15), and are partitioned according to absolute-register num- 
bers, as shown in Figure 3-2. 

The Register Bank Protect Register contains 16 protection bits, where each bit con- 
trols User-mode accesses (read or write) to a bank of registers. Bits 0-1 5 of the Reg- 
ister Bank Protect Register protect register banks through 15, respectively. 

When a bit In the Register Bank Protect Register is 1 , and a register in the corre- 
sponding bank Is specified as an operand register or result register by a User-mode 
instruction, a Protection Violation trap occurs. Note that protection is based on 
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Figure 3-2 
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absolute-register numbers; in the case of local registers, Stack-Pointer addition is 
performed before protection checking. 

When the processor is in Supervisor or Monitor mode, the Register Bank Protect 
Register has no effect on general-purpose register accesses. 

3.2.1.7 INDIRECT ACCESSES 

Specification of Global Register as an instruction-operand register or result register 
causes an indirect access to the general-purpose registers. In this case, the absolute- 
register number is provided by an indirect pointer contained in a special-purpose 
register. 

Each of the three possible registers for instruction execution has an associated 8-bit 
Indirect pointer. Indirect register numbers can be selected independently for each of 
the three operands. Since the indirect pointers contain absolute-register numbers, the 
number in an indirect pointer Is not added to the Stack Pointer when local registers 
are selected. 

The indirect pointers are set by the Move To Special Register, DIVIDE, DIVIDU, 
SETIP, and EMULATE instructions. The indirect pointers are also set by Floating- 
Point, MULTIPLY, MULTM, MULTIPLU, and MULTMU instructions when these cause 
exceptions, i his allows the exception handler to access the instruction operands. 

For a Move-To-Special-Register instruction, an indirect pointer is set with bits 9-2 of 
the 32-blt source operand. This provides consistency between the addressing of 
words in general-purpose registers and the addressing of words in external devices or 
memories. A modification of an indirect pointer using a Move To Special Register has 
a delayed effect on the addressing of general-purpose registers, as discussed in 
Section 7.4.3. 
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For the remaining instructions, ail tliree indirect pointers are set simultaneously with 
the absolute-register numbers derived from the register numbers specified by the 
instruction. For any local registers selected by the instruction, the Stack-Pointer addi- 
tion is applied to the register numbers before the indirect pointers are set. 

Register numbers stored into the indirect pointers are checked for bank-protection 
violations— except when an indirect pointer is set by a Move-To-Special-Register 
instruction — at the time that the indirect pointers are set. 

3.2.2 Floating-Point Accumulator Registers 

Four 64-bit Accumulator Registers ACC(3-0) are provided for use with the floating- 
point multiply-accumulate (FMAC, DMAC) and multiply-sum (FMSM, DMSM) opera- 
tions. These registers can contain either single- or double-precision floating-point 
numbers. 

The Accumulator Registers are written with the Move To Accumulator (MTACC) 
instruction and read with the Move From Accumulator (MFACC) instruction. Any of the 
four Accumulator Registers can be used as a source or destination for the multiply-ac- 
cumulate operations. ACCO can also be used as a source for the multiply-sum opera- 
tions (see Section 3.3.7). 

3.2.3 Special-Purpose Registers 

The Am29050 microprocessor contains 39 special-purpose registers. The organiza- 
tion of the special-purpose registers is shown in Figure 3-3. 

Special-purpose registers provide controls and data for certain processor operations. 
Some special-purpose registers are updated dynamically by the processor, independ- 
ent of software controls. Because of this, a read of a special-purpose register follow- 
ing a write does not necessarily get the data that was written. 

Some special-purpose registers have fields that are reserved for future processor 
implementations. When a special-purpose register is read, a bit in a reserved field is 
read as a 0. An attempt to write a reserved bit with a 1 has no effect; however, this 
should be avoided because of upward-compatibility considerations. 

The special-purpose registers are accessed by explicit data movement only. Instruc- 
tions that move data to or from a special-purpose register specify the special-purpose 
register by an 8-bit field containing a special-purpose register number. Register num- 
bers are specified directly by instructions. 

The special-purpose registers are partitioned into protected and unprotected regis- 
ters. Special-purpose registers numbered 0-127 and 165-255 are protected (note 
that not all of these are implemented). Special-purpose registers numbered 128-164 
are unprotected (again, not all are implemented). 

An attempted read of an unimplemented special-purpose register yields an unpredict- 
able value. An attempted write of an unimplemented, protected special-purpose regis- 
ter has an unpredictable effect on processor operation. An attempted write of an 
unimplemented, unprotected special-purpose register has no effect; however, this 
should be avoided because of upward-compatibility considerations. 

Unprotected special-purpose registers are accessible by programs executing in the 
User, Supervisor, and Monitor modes. 
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Figure 3-3 
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3.2.3.1 VECTOR AREA BASE ADDRESS (VAB, REGISTER 0) 

This protected special-purpose register (see Figure 3-4) specifies the beginning ad- 
dress of the interrupt/trap Vector Area. The Vector Area is either a table of 256 vec- 
tors which points to interrupt and trap handling routines, or a segment of 256, 64-in- 
structlon blocks which directly contain the interrupt and trap handling routines. 



Figure 3-4 
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The organization of the Vector Area Is determined by the Vector Fetch (VF) bit of the 
Configuration Register. If the VF bit Is 1 when an Interrupt or trap is taken, the vector 
number for the interrupt or trap (see Section 3-5.4) replaces bits 9-2 of the value in 
the Vector Area Base Address Register to generate the physical address for a vector 
contained In instruction/data memory. 

If the VF bit is 0, the vector number replaces bits 15-8 of the value in the Vector Area 
Base Address Register to generate the physical address of the first Instruction of the 
Interrupt or trap handler. The Instruction fetch for this instruction Is directed either to 
Instruction memory or Instruction read-only memory as determined by the ROM Vec- 
tor Area (RV) bit of the Configuration Register. 

Bits 31-10: Vector Area Base (VAB)— The VAB field gives the beginning physical 
address of the Vector Area. This address is constrained to begin on a 1-kb address- 
boundary in Instruction/data memory or instruction read-only memory. If the Vector 
Area is an instruction segment, bits 10-15 are ignored, and the alignment is forced to 
a 64-kb boundary. 

Bits 9-0: Zeros— These bits force the alignment of the Vector Area to a 1 -kb 
boundary. 

3.2.3.2 OLD PROCESSOR STATUS (OPS, REGISTER 1) 

This protected special-purpose register has the same format as the Current Proces- 
sor Status described below. The Old Processor Status stores a copy of the Current 
Processor Status when an Interrupt or trap is taken. This is required since the Current 
Processor Status will be modified to reflect the status of the Interrupt/trap handler. 

During an Interrupt return, the Old Processor Status is copied into the Current Proces- 
sor Status. This allows the Current Processor Status to be set as required for the 
routine that Is the target of the Interrupt return. 

3.2.3.3 CURRENT PROCESSOR STATUS (OPS, REGISTER 2) 

This protected special-purpose register (see Figure 3-5) controls the behavior of the 
processor and Its ability to recognize exceptional events. 

Bits 31-17: Reserved. 

Bit 1 6: Monitor Mode (MM)— This read-only bit Is set by the processor upon entry 
into the monitor mode, and reset on exit. The MM bit has no counterpart in the Old 
Processor Status Register. 

Bit 15: Coprocessor Active (CA)— The CA bit Is set and reset under the control of 
load and store Instructions that transfer Information to and from a coprocessor. This 
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bit indicates tiiat tiie coprocessor is performing an operation at tiie tinne that an inter- 
rupt or trap is taken. Tliis notifies tlie interrupt or trap handler that the coprocessor 
contains state information to be preserved. Note that this notification occurs because 
the CA bit of the Old Processor Status is 1 in this case, not because of the value of 
the CA bit of the Current Processor Status. 

Bit 14: Interrupt Pending (IP)— This bit allows software to detect the presence of 
external int errupt s while they are disabled. The IP bit Is set if one or more of the exter- 
nal signals INTR(3~0) is active, but the processor is disabled from taking the resulting 
interrupt due to the value of the DA, Dl, or IM bits. If all external interrupt signals 
subsequently are de-asserted while still disabled, the IP bit is reset. 

Bits 13-12: Trace Enable, Trace Pending (TE, TP)— The TE and TP bits implement 
a software-controlled, instruction single-step facility. Single stepping is not imple- 
mented directly, but rather emulated by trap sequences controlled by these bits. The 
value of the TE bit is copied to the TP bit whenever an instruction completes execu- 
tion. When the TP bit is 1, a Trace trap occurs. Section 3.7.1 describes the use of 
these bits in more detail. 

Bit 11: Trap Unaligned Access (TU)—- The TU bit enables checking of address 
alignment for external data-memory accesses. When this bit is 1 , an Unaligned Ac- 
cess trap occurs if the processor either generates an address for an external word 
that is not aligned on a word address-boundary (i.e., either of the least-significant two 
bits is 1), or generates an address for an external half-word that is not aligned on a 
half-word address boundary (i.e., the least-significant address bit is 1). When the TU 
bit is 0, data-memory address alignment is ignored. 

Alignment is ignored for input/output accesses and coprocessor transfers. The align- 
ment of Instruction addresses is also Ignored (unaligned instruction addresses can be 
generated only by indirect jumps). Interrupt/trap vector addresses always are aligned 
properly. 

Bit 10: Freeze (FZ)— The FZ bit prevents certain registers from being updated during 
interrupt and trap processing, except by explicit data movement. The affected regis- 
ters are: Channel Address, Channel Data, Channel Control, Program Counter 0, 
Program Counter 1 , Program Counter 2, and the ALU Status Register. 

When the FZ bit is 1 , these registers hold their values. An affected register can be 
changed only by a Move-To-Special-Register instruction. When the FZ bit is 0, there 
is no effect on these registers, and they are updated by processor instruction execu- 
tion as described in this manual. 

The FZ bit Is set whenever an interrupt or trap Is taken, holding critical state in the 
processor so that It is not modified unintentionally by the interrupt or trap handler. 
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Bit 9: Lock (LK)— The LK bit controls the value of the LOCK external signal. If the LK 
bit is 1 , the LOCK signal Is active. If the LK bit is 0, the LOCK signal is controlled by 
the execution of the instructions Load and Set, Load and Lock, and Store and Lock. 
This bit is provided for the implementation of multi-processor synchronization 
protocols. 

Bit 8: ROM Enable (RE)— The RE bit enables instruction fetching from external 
instruction read-only memory (ROM). When this bit is 1, the IREQT signal directs all 
instruction requests to ROM. Instructions that are fetched from ROM are subject to 
capture and re-use by the Branch Target Cache memory when it is enabled; the 
Branch Target Cache memory distinguishes between instructions from ROM and 
those from non-ROM storage. When this bit is 0, off-chip requests for instructions are 
directed to Instruction/data memory. 

Bit 7: WAIT Mode (WM) — The WM bit places the processor in the Wait mode. When 
this bit is 1 , the processor performs no operations. The Wait mode is reset by an 
interrupt or trap for which the processor is enabled, or by the Reset mode. 

Bit 6: Physical Addressing/Data (PD)— The PD bit determines whether address 
translation is performed for load or store operations. Address translation is performed 
for an access only when this bit is 0, and the Physical Address (PA) bit in the load or 
store instruction causing the access is also 0. 

Bit 5: Physical Addressing/Instructions (PI) — The PI bit determines whether ad- 
dress translation is performed for external instruction accesses. Address translation is 
performed only when this bit is 0. 

Bit 4: Supervisor Mode (SM) — The SM bit protects certain processor context, such 
as protected special-purpose registers. When this bit is 1, the processor is in the 
Supervisor mode, and access to all processor context is allowed. When this bit is 0, 
the processor is in the User mode, and access to protected processor context is not 
allowed; an attempt to access (either read or write) protected processor context 
causes a Protection Violation trap. 

Section 3.1 describes the processor state protected from User-mode access. 

For an external access, the User Access (UA) bit in the load or store instruction also 
controls access to protected processor context. When the UA bit is 1 , the Memory 
Management Unit and channel perform the access as if the program causing the 
access were in User mode. 

Bits 3-2: Interrupt Mask (IM) — The IM field is an encoding of the processor priority 
with respect to external interrupts. The Interpretation of the interrupt mask is specified 
by the following table: 



IM Value Result 

INTR O enabled 

01 INTR(I-O) enabled 
1 INTR(2-0) enabled 
1 1 INTR(3-0) enabled 

Bit 1 : Disable Interrupts (Dl) — The PI bit prevents the processor from being inter- 
rupted by external interrupt requests INTR(3~0). When this bit is 1 , the processor 
ignores all external interrupts. However, note that traps (both internal and external), 
Timer interrupts, and Trace traps will be taken. When this bit is 0, the processor will 
take any interrupt enabled by the IM field, unless the DA bit is 1 . 
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Bit 0: Disable Ail Interrupts and Traps (DA)— The DA bit prevents the processor 
from taking any interrupts and mos t traps. When this bit is 1 , the processor ignores 
interrupts and traps, except for the WARN, Instruction Access Exception, Data Access 
Exception, and Coprocessor Exception traps. The processor enters the IVIonitor mode 
when the DA bi t is 1 , and either a valid breakpoint comparison or a trap (except for a 
trap caused by TRAP(1-0)) occurs. When the DA bit is 0, all traps will be taken, and 
interrupts will be taken If otherwise enabled. 

3.2.3.4 CONFIGURATION (CFG, REGISTER 3) 

This protected special-purpose register (see Figure 3-6) controls certain processor 
and system options. Most fields normally are modified only during system initializa- 
tion. The Configuration Register is defined as follows: 



Figure 3-6 
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Bits 31-24: Processor Release Level (PRL)--The PRL field is an 8-bit, read-only 
identification number which specifies the processor version. 

Bits 23-6: Reserved. 

Bit 7: Early Load Enable (EE)— -The EE bit determines whether the Early Load facil- 
ity is enabled. When this bit Is 1 , early loads are permitted to take place; when this bit 
is 0, the generation of early load addresses by either the Physical Address Cache or 
the Early Address Generator is disabled. 

Bit 6: Branch Target Cache Memory Organization (CO)— The CO bit determines 
the organization of the Branch Target Cache memory (BTC memory). When this bit is 

0, the BTC memory is organized into 64 entries of 4 instructions each. When this bit is 

1 , the BTC memory Is organized into 128 entries of 2 instructions each. The CO bit Is 
initialized to on reset. 

Bit 5: Data Width (DW)— The DW bit enables and disables byte and half-word exter- 
nal accesses. If the DW bit Is 0, byte and half-word accesses are not performed in 
hardware, and these accesses must be emulated by software. If the DW bit is 1, byte 
and half-word accesses are performed by hardware: this requires that external de- 
vices and memories be able to write individual bytes and half-words within a word. 
The DW bit is initialized to on reset. 

Bit 4: Vector Fetch (VF) — The VF bit determines the structure of the interrupt/trap 
Vector Area. If this bit is 1 , the Vector Area Is defined as a block of 256 vectors which 
specify the beginning addresses of the Interrupt and trap handling routines. If the VF 
bit is 0, the Vector Area is a segment of 256 64-instruction blocks that contain the 
actual routines. 

Bit 3: ROM Vector Area (RV)— If the VF bit is 0, the RV bit specifies whether the 
Vector Area is contained in instruction memory (RV = 0) or instruction read-only mem- 
ory (RV = 1 ). The value of the RV bit is irrelevant If the VF bit is 1 . 
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3.2.3.5 



Bit 2: Byte Order (BO)— The BO bit determines the ordering of bytes and half-words 
within words. If the BO bit is 0, bytes and half-words are numbered left-to-right within 
a word. If the BO bit Is 1 , bytes and half-words are numbered right-to-left. Section 
3.4.5 describes the interpretation of the BO bit in more detail. 

Bit 1 : Coprocessor Present (OP)— The CP bit indicates the presence of a coproces- 
sor that may be used by the processor. If this bit is 1 , it enables the execution of load 
and store instructions that have a Coprocessor Enable (CE) bit of 1 . If the CP bit Is 
and the processor attempts to execute a load or store Instruction with a CE bit of 1 , a 
Coprocessor Not Present trap occurs. This feature may be used to emulate coproces- 
sor operations as well as to protect the state of a coprocessor shared between multi- 
ple processes. 

Bit 0: Branch Target Cache Memory Disable (CD)— The CD bit determines whether 
or not the Branch Target Cache memory is used for non-sequential instruction refer- 
ences. When this bit is 1 , all instruction references are directed to external instruction 
memory or Instruction ROM, and the Branch Target Cache memory is not used. When 
this bit is 0, the targets of non-sequential Instruction fetches are stored in the Branch 
Target Cache memory and re-used as described In Section 4.2.2. The value of the 
CD bit does not take effect until the execution of the next branch instruction. The CO 
bit is initialized to 1 on reset. 

CHANNEL ADDRESS (CHAp REGISTER 4) 

This protected special-purpose register (Figure 3-7) is used to report exceptions 
during external accesses or coprocessor transfers. It also is used to restart inter- 
rupted load-multiple and store-multiple operations, and to restart other external ac- 
cesses when possible (e.g., after TLB misses are serviced). The restarting of external 
accesses Is described In Section 7.3.4. 



Figure 3-7 Channel Address Register 
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The Channel Address Register is updated on the execution of every load or store 
instruction, and on every load or store in a load-multiple or store-multiple sequence, 
except when the Freeze (FZ) bit in the Current Processor Status Register is 1 . 

Bits 31-0: Channel Address (CHA)— This field contains the address of the current 
channel transaction (if the FZ bit of the Current Processor Status Register is 0). For 
external data accesses, the address is virtual if address translation was enabled for 
the access, or physical if translation was disabled. For transfers to the coprocessor, 
the CHA field contains data transferred to the coprocessor. 

3.2.3.6 CHANNEL DATA (CHD, REGISTER 5) 

This protected special-purpose register (Figure 3-8) is used to report exceptions 
during external accesses or coprocessor transfers. It is also used to restart the first 
store of an interrupted store-multiple operation and to restart other external accesses 
when possible (e.g., after TLB misses are serviced). The restarting of external ac- 
cesses is described In Section 7.3.4. 
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Figure 3-8 Channel Data Register 

.31 23 




The Channel Data Register is updated on the execution of every load or store 
Instruction, and on every load or store in a load-multiple or store-multiple sequence, 
except when the Freeze (FZ) bit in the Current Processor Status Register is 1. When 
the Channel Data Register is updated for a load operation, the resulting value is 
unpredictable. 

Bits 31-0: Channel Data (CHD)— This field contains the data (if any) associated with 
the current channel transaction (if the FZ bit of the Current Processor Status Register 
is 0). If the current channel transaction is not a store or a transfer to the coprocessor, 
the value of this field is irrelevant. 

3.2.3.7 CHANNEL CONTROL (CHC, REGISTER 6) 

This protected special-purpose register (Figure 3-9) is used to report exceptions 
during external accesses or coprocessor transfers. It also is used to restart inter- 
rupted load-multiple and store-multiple operations, and to restart other external ac- 
cesses when possible (e.g., after TLB misses are serviced). The restarting of external 
accesses is described in Section 7.3.4. 
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The Channel Control Register is updated on the execution of every load or store 
instruction, and on every load or store in a load-multiple or store-multiple sequence, 
except when the Freeze (FZ) bit in the Current Processor Status Register is 1 . 

Bits 31-24:— These bits are a direct copy of bits 23-16 from the load or store 
instruction which started the current channel transaction (see Section 3.4.4 and 
Section 6.1.2). 

Bits 23-16: Load/Store Count Remaining (CR)— The CR field indicates the remain- 
ing number of transfers for a load-multiple or store-multiple operation that encoun- 
tered an exception or was interrupted before completion. This number is zero- 
based; for example, a value of 28 in this field indicates that 29 transfers remain to be 
completed. 

Bit 15: Load/Store (LS)— The LS bit is if the channel transaction is a store opera- 
tion, and 1 If it is a load operation. 

Bit 14: Muitiple Operation (IVIL)— The ML bit is 1 If the current channel transaction is 
a partially-complete load-multiple or store-multiple operation; othen^^ise it is 0. 
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Bit 13: Set (ST)— The ST bit is 1 if tlie current channel transaction is for a Load and 
Set instruction; othen^^ise it is 0. 

Bit 12: Locic Active (LA)— The LA bit is 1 If the current channel transaction is for a 
Load and Lock or Store and Lock instruction; otherwise it is 0. Note that this bit is not 
set as the result of the Lock (LK) bit in the Current Processor Status Register. 

Bit 11: Reserved. 

Bit 10: Transaction Fauited (TF)— The TF bit indicates that the current channel 
transaction did not complete due to some exceptional circumstance. This bit is set 
only for exceptions reported via the DERR input, and it causes a Data Access Excep- 
tion or Coprocessor Exception trap to occur (depending on the value of the CE bit) 
when it is 1 . 

The TF bit allows the proper sequencing of externally reported errors that get 
preempted by higher-priority traps (see Section 3.5.8). It is reset by software that 
handles the resulting trap. 

Bits 9-2: Target Register (TR)-— The TR field indicates the absolute-register number 
of data operand for the current transaction (either a load target or store data source). 
Since the register-number in this field is absolute, it reflects the Stack-Pointer addition 
when the indicated register is a local register. 

Bit 1 : Not Needed (NN)— The NN bit indicates that, even though the Channel Ad- 
dress, Channel Data, and Channel Control registers contain a valid representation of 
an incomplete load operation, the data requested is not needed. This situation arises 
when a load instruction is overlapped with an instruction which writes the load target 
register. 

Bit 0: Contents Vaiid (CV)— The CV bit indicates that the contents of the Channel 
Address, Channel Data, and Channel Control registers are valid. 

REGISTER BANK PROTECT (RBP, REGISTER 7) 

This protected special-purpose register (Figure 3-10) protects banks of general- 
purpose registers from User-mode program accesses. 



Figure 3-10 
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The general-purpose registers are partitioned Into 16 banks of 16 registers each 
(except that Bank contains 12 registers). The banks are organized as shown in 
Figure 3-2 of Section 3.2.1 . 

Bits 31-1 6: Reserved. 

Bits 15-0: Bank 15 througii Bank Protection Bits (B15-B0)— In the Register 
Bank Protect Register, each bit is associated with a particular bank of registers, and 
the bit number gives the associated bank number (e.g., B1 1 determines the protection 
for Bank 11). 

When a protection bit is 1 , the corresponding bank is protected from access by pro- 
grams executing in the User mode. A Protection Violation trap occurs when a User- 
mode program attempts to access (either read or write) a register in a protected bank. 
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3.2.3.9 



When a bit in this register is 0, the corresponding banl< is available to programs exe- 
cuting in the User mode. 

Supervisor-mode and Monitor-mode programs are not affected by the Register Bank 
Protect Register. 

Register protection is based on absolute-register numbers. For local registers, the 
protection checking is performed after the Stack-Pointer addition is performed. 

TIMER COUNTER (TMC, REGISTER 8) 

This protected special-purpose register (Figure 3-11) contains the counter for the 
Timer Facility. 



Figure 3-1 1 
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3.2.3.10 



Bits 31-24: Reserved. 

Bits 23-0: Timer Count Vaiue (TCV)— The 24-bit TCV field decrements by one on 
each processor clock. When the TCV field decrements to zero, it is reloaded with the 
content of the Timer Reload Value field in the Timer Reload Register. At this time, the 
Interrupt bit in the Timer Reload Register is set. 

TIMER RELOAD (TMR, REGISTER 9) 

This protected special-purpose register (Figure 3-12) maintains synchronization of the 
Timer Counter Register, enables Timer interrupts, and maintains Timer Facility status 
information. 



Figure 3-12 
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Bits 31-27: Reserved. 

Bit 26: Overfiow (OV)— The OV bit indicates that a Timer interrupt occurred before a 
previous Timer interrupt was serviced. It is set if the Interrupt (IN) bit is 1 (see below) 
when the Timer Count Value (TCV) field of the Timer Counter Register decrements to 
zero. In this case, a Timer interrupt caused by the IN bit has not been serviced when 
another interrupt is created. 

Bit 25: Interrupt (IN)— The IN bit is set whenever the TCV field decrements to zero. If 
this bit is 1 and the IE bit is also 1 , a Timer interrupt occurs. Note that the IN bit is set 
when the TCV field decrements to zero, regardless of the value of the IE bit. The IN 
bit is reset by software that handles the Timer interrupt. 
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3.2.3.1 1 



The TCV field is zero-based with respect to the Timer interrupt interval; for example, a 
value of 28 in the TCV field causes the IN bit to be set in the 29th subsequent proces- 
sor cycle. The reason for this is that the TCV field is zero for a complete cycle before 
the IN bit is set. 

Bit 24: Interrupt Enable (IE)— When the IE bit is 1 , the Timer interrupt is enabled, 
and the Timer interrupt occurs whenever the IN bit is 1 . When this bit is 0, the Timer 
interrupt is disabled. Note that Timer interrupts may be disabled by the DA bit of the 
Current Processor Status Register regardless of the value of the IE bit. 

Bits 23-0: Timer Reload Value (TRV)— The value of this field is written Into the 
Timer Count Value (TCV) field of the Timer Counter Register when the TCV field 
decrements to zero. 

PROGRAM COUNTER (POO, REGISTER 10) 

This protected special-purpose register (Figure 3-13) is used, on an interrupt return, 
to restart the instruction which was in the decode stage when the original interrupt or 
trap was taken. 



Figure 3-13 Program Counter Register 

31 23 




Bits 31-2: Program Counter (PCO)— This field captures the word-address of an 
instruction as it enters the decode stage of the processor pipeline, unless the Freeze 
(FZ) bit of the Current Processor Status Register is 1 . If the FZ bit is 1 , PCO holds its 
value. 

When an interrupt or trap is taken, the PCO field contains the word-address of the 
instruction In the decode stage; the interrupt or trap has prevented this instruction 
from executing. The processor uses the PCO field to restart this instruction on an 
interrupt return. 



3.2.3.12 



Bits 1-0: Zeros- 
aligned. 



-These bits are zero, since instruction addresses are always word 



PROGRAM COUNTER 1 (PCIp REGISTER 11) 

This protected special-purpose register (Figure 3-14) is used, on an interrupt return, 
to restart the instruction that was in the execute stage when the original interrupt or 
trap was taken. 



Figure 3-14 Program Counter 1 Register 
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Bits 31-2: Program Counter 1 (PC1)— This field captures the word-address of an 
instruction as it enters the execute stage of the processor pipeline, unless the Freeze 
(FZ) bit of the Current Processor Status Register is 1 . If the FZ bit is 1 , PC1 holds its 
value. 
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3.2.3.13 



When an Interrupt or trap is taken, the PC1 field contains the word-address of the 
instruction in the execute stage; the interrupt or trap has prevented this instruction 
from completing execution. The processor uses the PC1 field to restart this instruction 
on an interrupt return. 

Bits 1-0: Zeros— These bits are zero, since instruction addresses are always word 
aligned. 

PROGRAM COUNTER 2 (PC2, REGISTER 12) 

This protected special-purpose register (Figure 3-15) reports the address of pertain 
instructions causing traps. 



Figure 3-15 Program Counter 2 Register 
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Bits 31-2: Program Counter 2 (PC2)— This field captures the word address of an 
instruction as it enters the write-back stage of the processor pipeline, unless the 
Freeze (FZ) bit of the Current Processor Status Register is 1 . If the FZ bit Is 1 , PC2 
holds its value. 

When an interrupt or trap is taken, the PC2 field contains the word address of the 
instruction in the write-back stage. In certain cases, as described in Section 3.5.9, 
PC2 contains the address of the instruction causing a trap. The PC2 field is used to 
report the address of this instruction, and has no other use in the processor. 

Bits 1-0: Zeros— These bits are zero, since instruction addresses are always word 
aligned. 

3.2.3.1 4 MMU CONFIGURATION (MMU, REGISTER 13) 

This protected special-purpose register (Figure 3-16) specifies parameters associated 
with the Memory Management Unit (MMU). 



Figure 3-16 MIMU Configuration Register 
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Bits 31-10: Reserved. 

Bits 9-6: Page Size (PS)— The PS field specifies the page size for address transla- 
tion. The page size affects translation as discussed in Section 3.6.2. The PS field has 
a delayed effect on address translation (see Section 3.6.2). At least one cycle of delay 
must separate an Instruction which sets the PS field and an Instruction that performs 
address translation. The PS field is encoded as follows: 



PS 



Page Size 



00 
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10 
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11 
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3.2.3.15 



Bits 7-0: Process Identifier (PID)— For translated User-mode loads and stores, this 
8-bit field is compared to Task Identifier (TID) fields in Translation Look-Aside Buffer 
entries when address translation is performed. For the address translation to be valid, 
the PID field must match the TID field in an entry. This allows a separate 32-bit virtual- 
address space to be allocated to each active User-mode process (within the limit of 
255 such processes). Translated Supervisor-mode and Monitor-mode loads and 
stores use a fixed process Identifier of zero, and require that the TID field be zero for 
successful translation. 

LRU RECOMMENDATION (LRU, REGISTER 14) 

This protected special-purpose register (Figure 3-17) assists Translation Look-Aside 
Buffer (TLB) re-loading by indicating the least-recently used TLB entry in the required 
replacement line. 



Figure 3-17 LRU Recommendation Register 
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Bits 31-7: Reserved. 

Bits 6-1 : Least-Recently Used Entry (LRU)— The LRU field is updated whenever a 
TLB miss occurs during an address translation. It gives the TLB register number of 
the TLB entry selected for replacement. The LRU field also is updated whenever a 
memory-protection violation occurs; however, it has no interpretation in this case. 

Bit 0: Zero— The appended serves to identify Word of the TLB entry. 

REASON VECTOR (RSN, REGISTER 15) 

This protected special-purpose register (Figure 3-18) reports the cause of a trap into 
the Monitor Mode. 



3.2.3.16 



Figure 3-18 
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Bits 31-6: Reserved. 

Bits 7-0: Reason Vector (RSN)— The RSN field is set whenever a Monitor trap 
occurs (see Section 3.5.7). The RSN field is set to the vector number of the trap which 
would have been taken had the Monitor trap not been taken. 

3.2.3.17 REGION MAPPING ADDRESS (RMAO, REGISTER 16) 

This protected special-purpose register (Figure 3-19) specifies a mapping from a 
region of virtual address space to physical address space. Together with the Region 
Mapping Control Register, it controls the Region Mapping Unit 0. 

Bits 31-16: Virtual Base Address (VBA)--The VBA field defines the base address 
of the virtual region to be mapped. The most-significant bits of this field are compared 
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Figure 3-19 Region Mapping Address Register 
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to the corresponding bits of the virtual address during address translation. The num- 
ber of bits connpared is determined by the size of the virtual region, as defined by the 
Region Size field of the Region Mapping Control Register. All unused bits of the 
VBA field must be 0. 

Bits 15-0: Physical Base Address (PBA)— The PBA field defines the base address 
of the physical region. When an address translation is performed, the most-significant 
bits of this field replace the corresponding bits of the virtual address. The number of 
bits replaced is determined by the size of the virtual region, as defined by the Region 
Size field of the Region Mapping Control Register. All unused bits of the PBA field 
must be 0. 

3.2.3.18 REGION MAPPING CONTROL (RMCO, REGISTER 17) 

This protected special-purpose register {Figure 3-20) contains control information 
associated with the mapping specified by the Region Mapping Address Register. 
Together with Region Mapping Address Register, it controls the Region Mapping 
UnitO. 



Figure 3-20 Region Mapping Control Register 
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Bits 31-24: Reserved. 

Bits 23-22: User-Programmable (PGM)— These bits are placed on the 
MPGM{1-0) outputs when a translated address is transmitted for an access. 
They have no predefined effect on the access; any effect is defined by logic 
external to the processor. 

Bit 21 : Reserved. 

Bits 20-17: Region Size (RGS)— The RGS field defines the size of the virtual region. 
The value in the RGS field is the number of low-order address bits which are ignored 
in virtual address comparisons and physical address substitutions. Thus, if the RGS 
value is 0000, the size of the virtual region is 64 kb; if the RGS value Is 0001 , the size 
of the virtual region is 128 kb; and so on, up to an RGS value of 11 11 and a maximum 
virtual region size of 2 Gb. 

Bit 16: Input/Output Address Space (10)— When the 10 bit is 1 , a valid translation 
results in an access to the input/output address space. When the 10 bit is 0, the ac- 
cess is performed in the instruction/data memory address space. 
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Bit 15: Reserved. 

Bit 14: Valid Entry (VE)— If the VE bit is 1, Region Mapping Address Register 
specifies a valid translation. If the VE bit is 0, the translation is invalid. 

Bit 13: Supervisor Read (SR)— When the SR bit is 1, Supervisor-mode load opera- 
tions to the virtual region are permitted. When the SR bit is 0, such loads are not 
permitted, and any attempt is trapped with a Data MMU Protection Violation. 

Bit 12: Supervisor Write (SW) — When the SW bit is 1 , Supervisor-mode store opera- 
tions to the virtual region are permitted. When the SW bit is 0, such stores are not 
permitted, and any attempt is trapped with a Data MMU Protection Violation. 

Bit 11: Supervisor Execute (SE)— When the SE bit is 1 , Supervisor-mode instruction 
accesses to the virtual region are permitted. When the SE bit is 0, such accesses are 
not permitted, and any attempt is trapped with an Instruction MMU Protection Vio- 
lation. 

Bit 10: User Read (UR)— When the UR bit is 1 , User-mode load operations to the 
virtual region are permitted. When the UR bit is 0, such loads are not permitted, and 
any attempt is trapped with a Data MMU Protection Violation. 

Bit 9: User Write (UW)— When the UW bit is 1 , User-mode store operations to the 
virtual region are permitted. When the UW bit is 0, such stores are not permitted, and 
any attempt is trapped with a Data MMU Protection Violation. 

Bit 8: User Execute (UE)— When the UE bit is 1 , User-mode instruction accesses to 
the virtual region are permitted. When the UE bit is 0, such accesses are not permit- 
ted, and any attempt is trapped with an Instruction MMU Protection Violation. 

Bits 7-0: Task Identifier (TID)— The Task Identifier field allows Region Mapping 
Address Unit to be associated with a particular process. For a translation to be valid, 
the TID field must match the Process Identifier (PID) in the MMU Configuration Regis- 
ter. If the Task Identifier is zero, however, any othenvise-valid Supervisor-mode or 
Monitor-mode access is allowed, even if the Process Identifier Is not zero. 

3.2.3.19 REGION MAPPING ADDRESS 1 (RMA1, REGISTER 18) 

This protected special-purpose register specifies a mapping from a region of virtual 
address space to physical address space. Together with the Region Mapping Control 
1 Register, it controls the Region Mapping Unit 1 . 

The structure of the Region Mapping Address 1 Register is identical to that of the 
Region Mapping Address Register (Figure 3-19). 

3.2.3.20 REGION MAPPING CONTROL 1 (RMC1, REGISTER 19) 

This protected special-purpose register contains control information associated with 
the mapping specified by the Region Mapping Address 1 Register. Together with the 
Region Mapping Address 1 Register, it controls the Region Mapping Unit 1. 

The structure of the Region Mapping Control 1 Register is identical to that of the 
Region Mapping Control Register (Figure 3-20). 

3.2.3.21 SHADOW PROGRAM COUNTER (SPCO, REGISTER 20) 

This protected special-purpose register (Figure 3-21) is analogous to the Program 
Counter Register, except that it operates even when the FZ bit of the Current Proc- 
essor Status Register is 1 ; it freezes only upon entry into the Monitor Mode. The 
Shadow Program Counter Register is used upon exit from the Monitor Mode to 
restart the instruction which was in the decode stage at the time of entry. 
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Figure 3-21 
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Bits 31-2: Shadow Program Counter (SPCO)— This field captures the word- 
address of an instruction as it enters the decode stage of the processor pipeline, 
unless the processor is in the Monitor Mode. While the processor Is in the Monitor 
Mode, the value of SPCO is not modified. 

Bits 1-0: Zeros— These bits are always zero, since instruction addresses are word- 
aligned. 

3.2.3.22 SHADOW PROGRAM COUNTER 1 (SPC1, REGISTER 21) 

This protected special-purpose register (Figure 3-22) is analogous to the Program 
Counter 1 Register, except that it operates even when the FZ bit of the Current Proc- 
essor Status Register is 1 ; it freezes only upon entry into the Monitor Mode. The 
Shadow Program Counter 1 Register is used upon exit from the Monitor Mode to 
restart the instruction which was in the execute stage at the time of entry. 

Figure 3-22 Shadow Program Counter 1 Register 
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Bits 31-2: Shadow Program Counter 1 (SPC1)— This field captures the word-ad- 
dress of an instruction as it enters the execute stage of the processor pipeline, unless 
the processor is in the Monitor Mode. While the processor is in the Monitor Mode, the 
value of SPC1 is not modified. 

Bits 1-0: Zeros— These bits are always zero, since instruction addresses are word- 
aligned. 

3.2.3.23 SHADOW PROGRAM COUNTER 2 (SPC2, REGISTER 22) 

This protected special-purpose register (Figure 3-23) is analogous to the Program 
Counter 2 Register, except that it operates even when the FZ bit of the Current Proc- 
essor Status Register is 1 ; It freezes only upon entry into the Monitor Mode. The 
Shadow Program Counter 2 Register provides information only; it is not used by 
processor in a return from Monitor Mode. 



Figure 3-23 
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3.2.3.24 



Bits 31-2: Shadow Program Counter 2 (SPC2)— This field captures the word-ad- 
dress of an Instruction as it enters the write-back stage of the processor pipeline, 
unless the processor is in the Monitor Mode. While the processor is in the Monitor 
Mode, the value of SPC2 is not modified. 

Bits 1-0: Zeros — These bits are always zero, since instruction addresses are word- 
aligned. 

INSTRUCTION BREAKPOINT ADDRESS (IBAO, REGISTER 23) 

This protected special-purpose register (Figure 3-24) contains the address of an 
instruction breakpoint. 



Figure 3-24 Instruction Breaicpoint Address Register 

31 23 15 



1 1 1 

IB A 


n n 


1 1 1 1 1 1 1 1 1 1 1 1 





Bits 31-2: Instruction Breakpoint Address (IB A)— The value in the IBA field is 
compared to the value of the Program Counter to determine whether an instruction 
breakpoint has been encountered. 

Bits 1-0: Zeros— These bits are always zero, since instruction addresses are word- 
aligned. 

3.2.3.25 INSTRUCTION BREAKPOINT CONTROL (IBCO, REGISTER 24) 

This protected special-purpose register (Figure 3-25) contains control and status 
information for the instruction breakpoint specified by the Instruction Breakpoint Ad- 
dress Register. 



Figure 3-25 
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Bits 31-13: Reserved. 

Bit 12: Breakpoint Has Occurred (BHO)— The BHO bit indicates whether a trap for 
valid breakpoint comparison has occurred. When such a trap occurs, the BHO bit is 
set to 1 . At the next valid breakpoint comparison, the BHO bit is reset to 0, and the 
breakpoint trap is not taken. The BHO bit acts as a temporary breakpoint disable, 
ensuring that only one breakpoint comparison trap Is taken each time the breakpoint 
Is encountered and allowing the processor to progress past the breakpoint address. 

Bit 1 1 : Breakpoint Enable (BEN)— When the BEN bit is 1 , the breakpoint compari- 
son is enabled. When the BEN bit is 0, the breakpoint comparison is disabled and 
neither a breakpoint nor a synchronization pulse is generated when the breakpoint 
condition is met. The BEN bit is initialized to upon reset. 
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Bit 10: Break or Synchronize (BSY)— The BSY bit determines tlie action taken 
when the breal<point condition is met. If the BSY bit is 1 , a breai^point occurs; If the 
BSY bit is 0, a synchronization pulse is generated (see Section 5.3). 

Bit 9: Break ROM (BRM)— If the BRM bit is 0, the breakpoint comparison is per- 
formed only for addresses In the Instruction memory address space. If the BRM bit Is 
1 , the breakpoint comparison is performed only for addresses in the instruction ROM 
address space. 

Bit 8: Break on Translation Enabled (BTE)— If the BTE bit is 1 , the breakpoint 
comparison is performed only when instruction translation is enabled (that is, when 
the PI bit of the Current Processor Status Register is 0). If the BTE bit is 0, the break- 
point comparison is performed when instruction translation is disabled (the PI bit is 1). 
Comparisons for translated addresses are further conditioned by the BPID field and 
the Process Identifier field of the MMU Configuration Register; these fields are Ig- 
nored if the BTE bit is 0. 

Bits 7-0: Breakpoint Process Identifier (BPID)— The BPID field allows the break- 
point comparison of virtual Instruction addresses to be associated with a particular 
process. The BPID field Is Ignored for untranslated instruction addresses. For a User- 
mode virtual instruction address, the value of the BPID field must match the value of 
the PID field of the MMU Configuration Register for the breakpoint comparison to be 
valid. For a Supervisor-mode virtual address, the breakpoint condition is met only if 
the value of the BPID field is 0. 

3.2.3.26 INSTRUCTION BREAKPOINT ADDRESS 1 (IBA1, REGISTER 25) 

This protected special-purpose register contains the address of an instruction break- 
point. 

The structure of the Instruction Breakpoint Address 1 Register is identical to that of 
the Instruction Breakpoint Address Register (Figure 3-24). 

3.2.3.27 INSTRUCTION BREAKPOINT CONTROL 1 (IBC1, REGISTER 26) 

This protected special-purpose register contains control and status information for the 
Instruction breakpoint specified by the Instruction Breakpoint Address 1 Register. 

The structure of the Instruction Breakpoint Control 1 Register is identical to that of the 
Instruction Breakpoint Control Register (Figure 3-25). 

3.2.3.28 REGISTERS 1 1 2-1 27^RESERVED FOR TESTING 

Special-purpose registers 1 12 to 127 are reserved for hardware testing. In the User 
Mode, an attempt to read or write these registers causes a Protection Violation trap. 
In the Supervisor and Monitor Modes, attempted writes have unpredictable effects on 
processor operation. 

3.2.3.29 INDIRECT POINTER C (IPC, REGISTER 128) 

This unprotected special-purpose register (Figure 3-26) provides the RC-operand 
register number (see Section 8.3) when an instruction RC field has the value zero 
(i.e., when Global Register Is specified). 



Figure 3-26 Indirect Pointer C Register 
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Bits 31-10: Reserved. 

Bits 9-2: Indirect Pointer C (IPC)— The 8-bit IPC field contains an absolute-register 
number for a general-purpose register. This number directly selects a register (Stack- 
Pointer addition is not performed in the case of local registers). 

Bits 1-0: Zeros— The IPC field is aligned for compatibility with word addresses. 

3.2.3.30 INDIRECT POINTER A (IPA, REGISTER 1 29) 

This unprotected special-purpose register (Figure 3-27) provides the RA-operand 
register number (see Section 8.3) when an instruction RA field has the value zero 
(i.e., when Global Register is specified). 

Figure 3-27 Indirect Pointer A Register 
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Bits 31-10: Reserved. 

Bits 9-2: Indirect Pointer A (IPA)— The 8-bit IPA field contains an absolute-register 
number for either a general-purpose register or a local register. This nunnber directly 
selects a register (Stack-Pointer addition is not performed in the case of local regis- 
ters). 

Bits 1-0: Zeros — The IPA field is aligned for compatibility with word addresses. 

3.2.3.31 INDIRECT POINTER B (IPB, REGISTER 130) 

This unprotected special-purpose register (Figure 3-28) provides the RB-operand 
register number (see Section 8.3) when an instruction RB field has the value zero 
(i.e., when Global Register is specified). 



Figure 3-28 



Indirect Pointer B Register 

31 23 15 


7 







i 1 1 1 III 1 1 1 1 1 1 

Reserved 


1 1 

IPB 









Bits 31-10: Reserved. 

Bits 9-2: Indirect Pointer B (IPB)— The 8-bit IPB field contains an absolute-register 
number for a general-purpose register. This number directly selects a register (Stack- 
Pointer addition is not performed in the case of local registers). 

Bits 1-0: Zeros — The IPB field is aligned for compatibility with word addresses. 

3.2.3.32 Q(Q, REGISTER 131) 

The Q Register is an unprotected special-purpose register (Figure 3-29). 
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Figure 3-29 Q Register 




3.2.3.33 



Bits 31-0: Quotient/Multiplier (Q)— During a sequence of divide steps, tliis field 
holds the low-order bits of the dividend; it contains the quotient at the end of the 
divide. During a sequence of multiply steps, this field holds the nnultiplier; it contains 
the low-order bits of the result at the end of the multiply. 

For an integer divide instruction, the Q field contains the high-order bits of the divi- 
dend at the beginning of the instruction, and contains the remainder upon completion 
of the instruction. 

ALU STATUS (ALU, REGISTER 132) 

This unprotected special-purpose register (Figure 3-30) holds information about the 
outcome of Arithmetic/Logic Unit (ALU) operations as well as control for certain opera- 
tions performed by the Execution Unit. 



Figure 3-30 
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Bits 31-12: Reserved. 

Bit 11: Divide Flag (DF)~The DF bit Is used by the instructions that implement 
division. This bit is set at the end of the division instructions either to 1 or to the com- 
plement of the 33rd bit of the ALU. When a Divide Step instruction is executed, then 
the DF bit determines whether an addition or subtraction operation is performed by 
the ALU. 

Bit 10: Overflow (V)— The V bit indicates that the result of a signed, two's-comple- 
ment ALU operation required more than 32 bits to represent the result correctly. The 
value of this bit is determined by exclusive-ORing the ALU carry-out with the carry-In 
to the most-significant bit for signed, two's-complement operations. This bit is not 
used for any special purpose in the processor, and Is provided for information only. 

Bit 9: Negative (N)— The N bit is set with the value of the most-significant bit of the 
result of an arithmetic or logical operation. If two's-complement overflow occurs, the N 
bit does not reflect the true sign of the result. This bit is used in divide operations. 

Bit 8: Zero (Z)— The Z bit indicates that the result of an arithmetic or logical operation 
is zero. This bit is not used for any special purpose in the processor, and is provided 
for information only. 

Bit 7: Carry (C)— The C bit stores the carry-out of the ALU for arithmetic operations. 
It is used by the add-with-carry and subtract-with-carry instructions to generate the 
carry into the Arithmetic/Logic Unit. 
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Bits 6-5: Byte Pointer (BP)— The BP field holds a 2-bit pointer to a byte within a 
word. It is used by Insert Byte and Extract Byte instructions. The mapping of the 
pointer value to the byte position depends on the value of the Byte Order (BO) bit in 
the Configuration Register. 

The most-significant bit of the BP field is used to determine the position of a half-word 
within a word for the Insert Half-Word, Extract Half-Word, and Extract Half-Word, 
Sign-Extended instructions. The mapping of the most-significant bit to the half-word 
position depends on the value of the BO bit in the Configuration Register. 

The BP field is set by a Move To Special Register instruction with either the ALU 
Status Register or the Byte Pointer Register as the destination. It is also set by a load 
or store instruction If the Set Byte Pointer (SB) bit in the instruction is 1. A load or 
store sets the BP field either with the two least-significant bits of the address (if the 
DW bit of the Configuration Register is 0) or with the complement of the Byte Order bit 
of the Configuration Register (if DW is 1). 

Bits 4-0: Funnel Shift Count (FC)— The FC field contains a 5-bit shift count for the 
Funnel Shifter. The Funnel Shifter concatenates two source-operands into a single 
64-bit operand and extracts a 32-bit result from this 64-bit operand; the FC field speci- 
fies the number of bit positions from the most-significant bit of the 64-bit operand to 
the most-significant bit of the 32-bit result. The FC field is used by the EXTRACT 
instruction. 

The FC field is set by a Move To Special Register instruction with either the ALU 
Status Register or the Funnel Shift Count Register as the destination. 



3.2.3.34 BYTE POINTER (BP, REGISTER 133) 

This unprotected special-purpose register (Figure 3-31) 
to the BP field in the ALU Status Register. 



provides an alternate access 



Figure 3-31 
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Bits 31-2: Zeros. 

Bits 1-0: Byte Pointer (BP)— This field allows a program to change the BP field 
without affecting other fields in the ALU Status Register. 

3.2.3.35 FUNNEL SHIFT COUNT (FC, REGISTER 134) 

This unprotected special-purpose register (Figure 3-32) provides an alternate access 
to the FC field in the ALU Status Register. 



Figure 3-32 Funnel Shift Count Register 
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Bits 31-6: Zeros. 

Bits 4-0: Funnel Shift Count (FC)— This field allows a program to change the FC 
field without affecting other fields in the ALU Status Register. 
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3.2.3.36 LOAD/STORE COUNT REMAINING (CR, REGISTER 135) 

This unprotected special-purpose register (Figure 3-33) provides alternate access to 
the CR field in the Channel Control Register. 

Figure 3-33 Load/Store Count Remaining Register 

31 23 15 7 



Figure 3-34 
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Bits 31-^: Zeros. 

Bits 7-0: Load/Store Count Remaining (CR)— This field allows a program to 
change the CR field without affecting other fields in the Channel Control Register, and 
Is used to Initialize the value before a Load Multiple or Store Multiple instruction is 
executed. 

3.2.3.37 FLOATING-POINT ENVIRONMENT (FPE, REGISTER 160) 

This unprotected special-purpose register (Figure 3-34) contains control bits that 
affect the execution of floating-point operations. Writing the Floating-Point Environ- 
ment Register Is a serializing operation; that is, all currently executing floating-point 
operations are completed before the write is performed. 
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Bits 31 -1 1 : Reserved. 

Bits 10-9: Accumulator Format (ACF)-— The ACF field specifies the format of the 
Floating-Point Accumulator Registers, as follows: 



ACF1-0 Accumulator Format 



00 
01 
10 

11 



Resen/ed 
Single-Precision 
Double-Precision 
Resented 



Bit 8: Fast Float Select (FF)— The FF bit being 1 enables fast floating-point opera- 
tions, In which certain requirements of the IEEE floating-point specification are not 
met. This improves the performance of certain operations by sacrificing conformance 
to the IEEE specification. The fast floating-point operations are discussed in 
Section 7.2.8. 
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Bits 7-6: Floating-Point Round Mode (FRM)— This field specifies the default mode 
used to round the results of floating-point operations, as follows: 



FRI\/l1-0 Round Mode 



3.2.3.38 



00 
01 
10 

11 



Round to nearest 
Round to -oo 
Round to +00 
Round to zero 



Rounding is discussed in Section 7.2.7. 

Bit 5: Floating-Point Divide-By-Zero Mask (DM)— If the DM bit is 0, a Floating-Point 
Exception trap occurs when the divisor of a floating-point division operation is zero 
and the dividend is a non-zero, finite number. If the DM bit is 1 , a Floating-Point 
Exception trap does not occur for divide-by-zero. 

Bit 4: Floating-Point Inexact Result Mask (XM)— If the XM bit is 0, a Floating-Point 
Exception trap occurs when the result of a floating-point operation is not equal to the 
infinitely precise result. If the XM bit is 1 , a Floating-Point Exception trap does not 
occur for an inexact result. 

Bit 3: Floating-Point Underflow Mask (UM)— If the UM bit is 0, a Floating-Point 
Exception trap occurs when the result of a floating-point operation is too small to be 
expressed in the destination format. If the UM bit is 1, a Floating-Point Exception trap 
does not occur for underflow. 

Bit 2: Floating-Point Overflow Mask (VM)— If the VM bit is 0, a Floating-Point 
Exception trap occurs when the result of a floating-point operation is too large to be 
expressed in the destination format. If the VM bit is 1 , a Floating-Point Exception trap 
does not occur for overflow. 

Bit 1 : Floating-Point Reserved Operand Mask (RM)— If the RM bit Is 0, a Floating- 
Point Exception trap occurs when one or more input operands to a floating-point 
operation is a reserved value, or when the result of a floating-point operation is a 
reserved value. If the RM bit is 1 , a Floating-Point Exception trap does not occur for 
reserved operands. 

Bit 0: Floating-Point Invalid Operation Mask (NM)— If the NM bit Is 0, a Floating- 
Point Exception trap occurs when the input operands to a floating-point operation 
produce an indeterminate result (e.g., «« times 0). If the NM bit Is 1 , a Floating-Point 
Exception trap does not occur for invalid operations. 

INTEGER ENVIRONMENT (INTE, REGISTER 161) 

This unprotected special-purpose register (Figure 3-35) contains control bits which 
affect the execution of integer multiplication and division operations. Writing the Inte- 
ger Environment Register is a serializing operation. All currently executing operations 
are completed before the write is performed. 



Figure 3-35 
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Bits 31-2: Reserved. 

Bit 1 : Integer Division Overfiow Masl( (DO)— If the DO bit is 0, an Out of Range trap 
occurs when overflow of a signed or unsigned 32-blt result occurs during a DIVIDE or 
DIVIDU instruction, respectively. If the DO bit is 1 , an Out of Range trap does not 
occur for overflow during integer divide operations. 

The DIVIDE and DIVIDU instructions always cause an Out of Range Trap upon divi- 
sion by zero, regardless of the value of the DO bit. 

Bit 0: Integer Multiplication Overfiow Exception Mask (MO)— If the MO bit is 0, an 

Out of Range trap occurs when overflow of a signed or unsigned 32-bit result occurs 
during a MULTIPLY or MULTIPLU instruction, respectively. If the DO bit is 1 , an Out 
of Range trap does not occur for overflow during integer nnultiply operations. 

3.2.3.39 FLOATING-POINT STATUS (FPS, REGISTER 162) 

This unprotected special-purpose register (Figure 3-36) contains status bits indicating 
the outcome of floating-point operations. 

The floating-point status bits are divided into two groups. The first group consists of 
the sticky status bits (DS, XS, US, VS, RS, and NS), which, once set, remain set until 
explicitly cleared by a Move-to-Special-Register (MTSR) or Move-to-Special-Register- 
Immediate (MTSRIM) instruction. Sticky status bits are updated in either of two ways: 

1 . For floating-point operations that do not cause a Floating-Point Exception trap 
(FMAC, DMAC, FMSM, DMSM, and MTACC), all sticky status bits are updated at 
the end of instruction execution. 

2. For all other floating-point operations, including CONVERT, only those sticky 
status bits corresponding to masked exceptions are updated. The update occurs 
at the end of instruction execution. 

The second group consists of the trap status bits (DT, XT, UT, VT, RT, and NT), 
which report the status of an operation for which a Floating-Point Exception trap is 
taken. These bits are updated only by an operation which takes a trap as a result of 
an unmasked Floating-Point Exception; all other operations leave these bits un- 
changed. A trap status bit Is updated regardless of the state of the corresponding 
exception mask In the Floating-Point Environment Register. 

Reading or writing the Floating-Point Status Register Is a serializing operation. All 
currently executing floating-point operations are completed before the read or write Is 
performed. 



Figure 3-36 



Floating-Point Status 

31 23 15 














7 















1 

Reserved 














Res 
















1 

1 
DT 


1 

1 

i 
1 


1 

1 

UT 


1 
1 

: 


t 

1 

RT 




i 
I 




e 

DS 


I 
i 
1 


1 
US 


1 
1 

! . 

1 


1 
1 

RS 


a 

1 
• 
1 



XT VT NT 



XS VS NS 



Bits 31-14: Reserved. 

Bit 13: Floating-Point Divide By Zero Trap (DT)— The DT bit is set when a Floating- 
Point Exception trap occurs, and the associated floating-point operation is a divide 
with a zero divisor and a non-zero, finite dividend. Othenwise, this bit is reset when a 
Floating-Point Exception trap occurs. 
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Bit 12: Floating-Point Inexact Result Trap (XT)— The XT bit is set when a Floating- 
Point Exception trap occurs, and the result of the associated floating-point operation 
is not equal to the Infinitely-precise result. OthenA/ise, this bit is reset when a Floating- 
Point Exception trap occurs. 

Bit 11 : Floating-Point Underflow Trap (UT)— The UT bit is set when a Floating- 
Point Exception trap occurs, and the result of the associated floating-point operation 
is too small to be expressed in the destination format. Othenvise, this bit Is reset when 
a Floating-Point Exception trap occurs. 

Bit 10: Floating-Point Overflow Trap (VT)— The VT bit is set when a Floating-Point 
Exception trap occurs, and the result of the associated floating-point operation is too 
large to be expressed in the destination format. OthenA/ise, this bit is reset when a 
Floating-Point Exception trap occurs. 

Bit 9: Floating-Point Reserved Operand Trap (RT)— The RT bit is set when a Float- 
ing-Point Exception trap occurs, and the result of the associated floating-point opera- 
tion is a reserved value. OthenA/ise, this bit is reset when a Floating-Point Exception 
trap occurs. 

Bit 8: Floating-Point Invalid Operation Trap (NT)— The NT bit is set when a Float- 
ing-Point Exception trap occurs, and the input operands to the associated floating- 
point operation produce an Indeterminate result. OthenA/ise, this bit is reset when a 
Floating-Point Exception trap occurs. 

Bits 7-6: Reserved. 

Bit 5: Floating-Point Divide By Zero Sticky (DS)— The DS bit is set when the DM 
bit of the Floating-Point Environment Register is 1 , the diviaor of a floating-point divi- 
sion operation is a zero, and the dividend is a non-zero, finite number. 

Bit 4: Floating-Point Inexact Result Sticky (XS)— The XS bit is set when the XM bit 
of the Floating-Point Environment Register is 1 , and the result of a floating-point 
operation is not equal to the infinitely precise result. 

Bit 3: Floating-Point Underflow Sticky (US)— The US bit Is set when the UM bit of 
the Floating-Point Environment Register Is 1 , and the result of a floating-point opera- 
tion is too small to be expressed in the destination format. 

Bit 2: Floating-Point Overflow Sticky (VS)— The VS bit is set when the VM bit of the 
Floating-Point Environment Register is 1, and the result of a floating-point operation is 
too large to be expressed in the destination format. 

Bit 1 : Floating-Point Reserved Operand Sticky (RS)— The RS bit is set when the 
RM bit of the Floating-Point Environment Register is 1 , and either one or more Input 
operands to a floating-point operation is a reserved value or the result of a floating- 
point operation is a reserved value. 

Bit 0: Floating-Point Invalid Operation Sticky (NS)— The NS bit is set when the NM 
bit of the Floating-Point Environment Register Is 1, and the input operands to a float- 
ing-point operation produce an indeterminate result. 

3.2.3.40 EXCEPTION OPCODE (EXOP, REGISTER 164) 

This unprotected special-purpose register (Figure 3-37) reports the opcode of an 
Instruction causing an Illegal Opcode, Floating-Point Exception, or Out-of-Range trap. 
Writing the Exception Opcode Register is a serializing operation. All currently execut- 
ing floating-point operations are completed before the write Is performed. 
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Figure 3-37 Exception Opcode Register 
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Bits 31-8: Reserved. 

Bits 7-0: Instruction Opcode (lOP) — This field captures the opcode of an instruction 
causing a trap as a result of instruction execution; the opcode is captured as the 
instruction enters the write-back stage of the processor pipeline. Instructions that do 
not trap as a consequence of execution do not modify the lOP field. 

The Exception Opcode Register can be written explicitly by using it as the destination 
of a Move-to-Special-Register (MTSR) instruction. 

3.2.4 TLB Registers 

The Am29050 microprocessor contains 128 Translation Look-Aside Buffer (TLB) 
registers. The organization of the TLB registers is shown in Figure 3-38. 



Figure S-SS Translation Look-Aside Buffer Registers 

TLB Rdg# TLB Sdt 



62 
63 



64 
65 



126 
127 



TLB Entry Line Word 


TLB Entry Line Word 1 


TLB Entry Line 1 Word 


TLB Entry Line 1 Word 1 


« 

e 


TLB Entry Line 31 Word 


TLB Entry Line 31 Word 1 


TLB S@t 1 


TLB Entry Line Word 


TLB Entry Line Word 1 


» 


TLB Entry Line 31 Word 


TLB Entry Line 31 Word 1 



14778A-005 



3-32 PROGRAMMER REFERENCE 



3.2.4.1 



The TLB registers comprise the TLB entries, and are provided so that programs may 
inspect and alter TLB entries. This allows the loading, invalidation, saving, and restor- 
ing of TLB entries. 

TLB registers have fields that are reserved for future processor implementations. 
When a TLB register is read, a bit in a reserved field is read as a 0. An attempt to 
write a reserved bit with a 1 has no effect; however, this should be avoided because 
of upward-compatibility considerations. 

The Translation Look-aside Buffer (TLB) registers are accessed only by explicit data 
movement by Supervisor-mode programs. Instructions that move data to or from a 
TLB register specify a general-purpose register containing a TLB register number. 
The TLB register number is given by the contents of bits 6-0 of the general-purpose 
register. TLB register numbers may only be specified indirectly by general-purpose 
registers. 

TLB entries are accessed as registers numbered 0-127. Since two words are re- 
quired to completely specify a TLB entry, two registers are required for each TLB 
entry. The words corresponding to an entry are paired as two sequentially numbered 
registers starting on an even-numbered register. The word with the even register 
number is called Word 0, and the word with the odd register number is called Word 1 . 
The entries for TLB Set are in registers numbered 0-63, and the entries for TLB 
Set 1 are in registers numbered 64-127. 

TLB ENTRY WORD 

The TLB Entry Word register is shown in Figure 3-39. 



Figure 3-39 
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Bits 31-15: virtual Tag (VTAG)— When the TLB is searched for an address transla- 
tion, the VTAG field of the TLB entry must match the most-significant 17, 16, 15, or 14 
bits of the address being translated— for page sizes of 1 , 2, 4, and 8 kb, respec- 
tively—for the search to be successful. 

When software loads a TLB entry with an address translation, the most-significant 14 
bits of the Virtual Tag are set with the most-significant 14 bits of the virtual address 
whose translation is being loaded into the TLB. The remaining three bits of the Virtual 
Tag must be set either to the corresponding bits of the address, or to zeros, depend- 
ing on the page size, as follows {A refers to corresponding address bits): 



Page Size 



VTAG 2-0 (TLB Word Bits 17-15) 



1 kb 
2kb 
4kb 
8kb 



AAA 
AAO 
AGO 
000 
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3.2.4.2 



Bit 14: Valid Entry (VE)— If this bit is 1 , the associated TLB entry is valid; if It Is 0, the 
entry Is Invalid. 

Bit 13: Supervisor Read (SR)— If the SR bit is 1 , Supervisor-mode load operations 
fronn the virtual page are allowed; If it is 0, Supervisor-mode loads are not allowed. 

Bit 12: Supervisor Write (SW)— If the SW bit is 1 , Supervisor-mode store operations 
to the virtual page are allowed; if it is 0, Supervisor-mode stores are not allowed. 

Bit 1 1 : Supervisor Execute (SE)-— If the SE bit Is 1 , Supervisor-mode instruction 
accesses to the virtual page are allowed; if it is 0, Supervisor-mode Instruction ac- 
cesses are not allowed. 

Bit 10: User Read (UR)— If the UR bit Is 1 , User-mode load operations from the 
virtual page are allowed; If it is 0, User-mode loads are not allowed. 

Bit 9: User Write (UW)— If the UW bit is 1 , User-mode store operations to the virtual 
page are allowed; if It is 0, User-mode stores are not allowed. 

Bit 8: User Execute (UE)— If the UE bit is 1 , User-mode Instruction accesses to the 
virtual page are allowed; If It Is 0, User-mode instruction accesses are not allowed. 

Bits 7-0: Tasic Identifier (TID)— -When the TLB is searched for an address transla- 
tion, the TID must match the Process Identifier (PID) In the MMU Configuration Regis- 
ter for the translation to be successful. This field allows the TLB entry to be associated 
with a particular process. 

TLB ENTRY WORD 1 

The TLB Entry Word 1 register is shown in Figure 3-40. 



Figure 3-40 
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Bits 31-10: Real Page Number (RPN)— The RPN field gives the most-significant 22, 
21 , 20, or 19 bits of the physical address of the page for page sizes of 1 , 2, 4. and 
8 kb, respectively. It is concatenated to bits 9-0, 10-0, 1 1-0, or 12-0 of the address 
being translated— for 1 , 2, 4, and 8 kb page sizes, respectively— to form the physical 
address for the access. 

When software loads a TLB entry with an address translation, the most-significant 19 
bits of the Real Page Number are set with the most-significant 19 bits of the physical 
address associated with the translation. The remaining three bits of the Real Page 
Number must be set either to the corresponding bits of the physical address, or to 
zeros, depending on the page size, as follows (A refers to corresponding address 
bits): 



Page Size 



RPN 2-0 (TLB Word 1 Bits 12-10) 



Ikb 
2kb 
4kb 
8kb 



AAA 
AAO 
AGO 
000 
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Bits 7-6: User Programmable (PGM)— These bits are placed on the MPGM{1-0) 
outputs when the address is transmitted for an access. They have no predefined 
effect on the access; any effect is defined by logic external to the processor. 

Bit 1 : Usage (U) — ^This bit indicates which entry in a given TLB line was least recently 
used to perform an address translation. If this bit is a 0, then the entry in Set in the 
line is least-recently-used; if it Is 1, then the entry in Set 1 is least-recently-used. This 
bit has an equal value for both entries in a line. Whenever a TLB entry is used to 
translate an address, the Usage bit of both entries in the line used for translation are 
set according to the TLB set containing the translation. This bit is set whenever the 
translation is valid, regardless of the outcome of memory-protection checking. 

Bit 0: Input/Output (10)— The 10 bit determines whether the access is directed to the 
instruction/data memory (10 =0) or the Input/output (10 = 1) address space. 



3.3 INSTRUCTION SET 



The Am29050 microprocessor implements 125 instructions. All instructions execute in 
a single cycle, except for IRET, IRETINV, LOADM, STOREM, and certain arithmetic 
instructions such as floating-point instructions. 

Most instructions deal with general-purpose registers for operands and results; how- 
ever, in most instructions, an 8-bit constant can be used in place of a register-based 
operand. Some instructions deal with special-purpose registers, TLB registers, exter- 
nal devices and memories, and coprocessors. 

This section describes the nine instruction classes In the Am29050 microprocessor, 
and provides a brief summary of instruction operations. A detailed instruction specifi- 
cation is contained in Chapter 8. Section 8.1 describes the nomenclature used here. 

If the processor attempts to execute an instruction which Is not implemented, an 
Illegal Opcode trap occurs, unless the instruction is reserved for emulation (see Sec- 
tion 3.3.10). Reserved instructions are assigned separate traps. 



3.3.1 Integer Arithmetic 



The Integer Arithmetic instructions perform add, subtract, multiply, and divide opera- 
tions on word-length integers. Certain instructions In this class cause traps if signed or 
unsigned overflow occurs during the execution of the instruction. There is support for 
multi-precision arithmetic on operands whose lengths are multiples of words. All 
Instructions in this class set the ALU Status Register. The Integer arithmetic instruc- 
tions are shown In Table 3-1 . 



3.3.2 Compare 



The Compare instructions test for various relationships between two values. For all 
Compare instructions except the CPBYTE instruction, the comparisons are performed 
on word-length signed or unsigned integers. There are two types of Compare instruc- 
tions. The first type places a Boolean value reflecting the outcome of the compare into 
a general-purpose register. For the second type (assert instructions), instruction 
execution continues only if the comparison is true; othen^/lse a trap occurs. The assert 
Instructions specify a vector for the trap (see Section 3.5.4). 

The assert instructions support run-time operand checking and operating-system 
calls. If the trap occurs in the User mode, and a trap number between and 63 Is 
specified by the instruction, a Protection Violation trap occurs. The Compare Instruc- 
tions are shown in Table 3-2. 
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Table 3-1 



Integer Arithmetic Instructions 



Mnemonic 



Operation Description 



ADD DEST<-SRCA + SRCB 

ADDS DEST<-SRCA + SRCB 

IF signed overflow THEN Trap (Out of Range) 

ADDU DEST<-SRCA + SRCB 

IF unsigned overflow THEN Trap (Out of Range) 

ADDC DEST<-SRCA + SRCB + C 

ADDCS DEST«-SRCA + SRCB + C 

IF signed overflow THEN Trap (Out of Range) 

ADDCU DEST«-SRCA + SRCB + C 

IF unsigned overflow THEN Trap (Out of Range) 

SUB DEST<-SRCA-SRCB 

SUBS DEST <- SRCA - SRCB 

IF signed overflow THEN Trap (Out of Range) 

SUBU DEST <- SRCA -SRCB 

IF unsigned underflow THEN Trap (Out of Range) 

SUBC DEST<-SRCA-SRCB-1 +C 

SUBCS DEST<-SRCA-SRCB-1 +C 

IF signed overflow THEN Trap (Out of Range) 

SUBCU DEST<-SRCA-SRCB-1+C 

IF unsigned underflow THEN Trap (Out of Range) 

SUBR DEST<-SRCB-SRCA 

SUBRS DEST <- SRCB -SRCA 

IF signed overflow THEN Trap (Out of Range) 

SUBRU DEST f- SRCB -SRCA 

IF unsigned underflow THEN Trap (Out of Range) 

SUBRC DEST <- SRCB- SRCA- 1+C 

SUBRCS DEST^SRCB-SRCA-1+C 

IF signed overflow THEN Trap (Out of Range) 

SUBRCU DEST ^ SRCB - SRCA -1 +C 

IF unsigned underflow THEN Trap (Out of Range) 

MULTIPLU DEST <- SRCA- SRCB (unsigned) 

MULTIPLY DEST 4- SRCA • SRCB (signed) 

MUL Perform one-bit step of a multiply operation (signed) 

MULL Complete a sequence of multiply steps 

MULTM DEST <- SRCA • SRCB (signed), most-significant bits 

MULTMU DEST <- SRCA • SRCB (unsigned), most-significant bits 

MULU Perform one-bit step of a multiply operation (unsigned) 

DIVIDE DEST<-(Q//SRCA)/SRCB (signed) 

Q<- Remainder 

DIVIDU DEST<- (Q//SRCA)/SRCB (unsigned) 

Q<- Remainder 

DIVO Initialize for a sequence of divide steps (unsigned) 

DIV Perform one-bit step of a divide operation (unsigned) 

DIVL Complete a sequence of divide steps (unsigned) 

DIVREM Generate remainder for divide operation (unsigned) 
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Table 3-2 



Compare Instructions 



Mnemonic 



Operation Description 



I 



CPEQ 

CPNEQ 

CPLT 

CPLTU 

CPLE 

CPLEU 

CPGT 

CPGTU 

CPGE 

CPGEU 

CPBYTE 

ASEQ 

ASNEQ 

ASLT 

ASLTU 

ASLE 

ASLEU 

ASGT 

ASGTU 

ASGE 

ASGEU 



IFSRCA^SRCBTHEN DEST<-TRUE 
ELSE DEST<- FALSE 

IF SRCAoSRCB THEN DEST^TRUE 
ELSE DEST<- FALSE 

IFSRCA<SRCBTHEN DEST^TRUE 
ELSE DEST^ FALSE 

IF SRCA<SRCB (unsigned) THEN DEST«-TRUE 
ELSE DEST«- FALSE 



IFSRCA<SRCBTHEN DEST< 
ELSE DEST<- FALSE 



-TRUE 



IF SRCA<SRCB (unsigned) THEN DEST<-TRUE 
ELSE DEST<- FALSE 



IFSRCA>SRCBTHENDEST< 
ELSE DEST<- FALSE 



-TRUE 



IF SRCA>SRCB (unsigned) THEN DEST^TRUE 
ELSE DEST<- FALSE 



If SRCA>SRCB THEN DEST< 
ELSE DESTf- FALSE 



-TRUE 



IF SRCA>SRCB (unsigned) THEN DEST<-TRUE 
ELSE DEST<- FALSE 

IF (SRCA.BYTEO = SRCB.BYTEO) OR 

(SRCA.BYTE1 =SRCB.BYTE1) OR 

(SRCA.BYTE2 = SRCB.BYTE2) OR 

(SRCA.BYTE3 = SRCB.BYTE3) THEN DEST<-TRUE 
ELSE DEST^ FALSE 

IF SRCA = SRCB THEN Continue 
ELSE Trap (VN) 

IF SRCAoSRCB THEN Continue 
ELSE Trap (VN) 

IF SRCA<SRCB THEN Continue 
ELSE Trap (VN) 

IF SRCA<SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 

IF SRCA < SRCB THEN Continue 
ELSE Trap (VN) 

IF SRCA< SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 

IF SRCA > SRCB THEN Continue 
ELSE Trap (VN) 

IF SRCA>SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 

IF SRCA > SRCB THEN Continue 
ELSE Trap (VN) 

IF SRCA > SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 
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3.3.3 Logical 

The Logical instructions perform a set of bit-by-bIt Boolean functions on word-length 
bit strings. All instructions in this class set the ALU Status Register. These Instructions 
are shown in Table 3-3. 



Table 3-3 



Logical Instructions 



Mnemonic 



AND 

ANDN 

NAND 

OR 

ORN 

NOR 

XOR 

XNOR 



Operation Description 



DEST«- 
DEST^ 
DEST^ 
DEST<- 
DEST^ 
DEST<- 
DESTf- 
DEST^ 



SRCA&SRCB 

SRCA&-SRCB 

-(SRCA&SRCB) 

SRCAjSRCB 

SRCAj-SRCB 

-(SRCAjSRCB) 

SRCA'^SRCB 

-(SRCA'^SRCB) 



3.3.4 Shift 

The Shift instructions (Table 3-4) perform arithmetic and logical shifts. All but the 
EXTRACT instruction operate on word-length data and produce a word-length result. 
The EXTRACT instruction operates on double-word data and produces a word-length 
result. If both parts of the double-word for the EXTRACT instruction are from the 
same source, the EXTRACT operation is equivalent to a rotate operation. For each 
operation, the shift count is a 5-bit integer, specifying a shift amount in the range of 
to 31 bits. 



Table 3-4 



Shift Instructions 



l\/inemonic 



Operation Description 



SLL DEST<-SRCA«SRCB (zero fill) 

SRL DEST^SRCA»SRCB (zero fill) 

SRA DEST<-SRCA»SRCB (sign fill) 

EXTRACT DEST<- high-order word of (SRCA//SRCB « FC) 



3.3.5 Data Movement 

The Data Movement instructions (Table 3-5) move bytes, half-words, and words 
between processor registers. In addition, they move data between general-purpose 
registers and external devices, memories, and the coprocessor. 

3.3.6 Constant 

The Constant instructions (Table 3-6) provide the ability to place half-word and word 
constants into registers. Most instructions in the instruction set allow an 8-bit constant 
as an operand. The Constant Instructions allow the construction of larger constants. 
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Table 3-5 



Data Movement Instructions 



Mnemonic 



Operation Description 



LOAD DESTf- EXTERNAL WORD [SRCB] 

LOADL DESTf- EXTERNAL WORD [SRCB] 

assert LOCK output during access 

LOADSET DEST<- EXTERNAL WORD [SRCB] 

EXTE RNAL W ORD [SRCB] ^ h'FFFFFFFF' 
assert LOCK output during access 

LOADM DEST.. DEST + COUNT <- 

EXTERNAL WORD [SRCB] .. 
EXTERNAL WORD [SRCB + COUNT • 4] 

STORE EXTERNAL WORD [SRCB]<-SRCA 

STOREL EXTE RNAL W ORD [SRCB] <- SRCA 

assert LOCK output during access 

STOREM EXTERNAL WORD [SRCB] .. 

EXTERNAL WORD [SRCB + COUNT • 4] <- 
SRCA.. SRCA + COUNT 

EXBYTE DEST <- SRCB, with low-order byte replaced by byte in SRCA 

selected by BP 

EXHW DEST f- SRCB, with low-order half-word replaced by half-word In SRCA 

selected by BP 

EXHWS DEST <- half-word in SRCA selected by BP, sign-extended to 32 bits 

INBYTE DEST <^ SRCA, with byte selected by BP replaced by low-order byte 

of SRCB 

INHW DEST ^ SRCA, with half-word selected by BP replaced by low-order 

half-word of SRCB 

MFSR DEST <- SPECIAL 

MFTLB DEST <- TLB [SRCA] 

MTSR SPDEST<- SRCB 

MTSRIM SPDEST^One 

MTTLB TLB [SRCA] <- SRCB 



Table 3-6 



Constant Instructions 



i\/lnemonic 



CONST 

CONSTH 

CONSTHZ 

CONSTN 



Operation Description 



DEST <- 0116 

Replace high-order half-word of SRCA by 116 

Replace high-order half-word of SRCA with 116, and replace 
low-order half-word of SRCA with zeros. 

DEST^1I16 
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3.3.7 Floating-Point 



The Floating-Point instructions (Table 3-7) provide operations on single-precision 
(32-bit) or double-precision (64-bit) floating-point data. They also provide conversions 
between single-precision, double-precision, and integer number representations. 



3.3.8 Branch 



The Branch instructions (Table 3-8) control the execution flow of instructions. Branch 
target addresses may be absolute, relative to the Program Counter (with the offset 
given by a signed instruction constant), or contained in a general-purpose register. 
For conditional jumps, the outcome of the jump is based on a Boolean value in a 
general-purpose register. Procedure calls are unconditional, and save the return 
address in a general-purpose register. All branches have a delayed effect; the instruc- 
tion sequence following the branch is executed regardless of the outcome of the 
branch. 



3.3.9 Miscellaneous 



The Miscellaneous instructions (Table 3-9) perform various operations that cannot be 
grouped into other instruction classes. In certain cases, these are control functions 
available only to Supervisor-mode programs. 



3.3.10 Reserved Instructions 

Several Am29050 microprocessor operation codes are reserved for instruction emula- 
tion. Each of these Instructions causes a trap and sets the indirect pointers IPC, iPA, 
and IPB. Some of these operation codes cause a trap to a unique trap vector, and 
others cause traps to shared trap vector 28. The relevant operation codes, and the 
corresponding trap vectors, are: 

Operation Codes (Hexadecimal) Trap Vector Numbers (Decimal) 

BF, CF-D6, DC 28 

DD 29 

E7 39 

F8 56 

FA-FF 58-63 

The reserved Instructions are intended for future processor enhancements, and 
users desiring compatibility with future processor versions should not use them for 
any purpose. 

3.4 DATA FORMATS AND HANDLING 

This section describes the various data types supported by the Am29050 micropro- 
cessor, and the mechanisms for accessing data in external devices and memories. 
The Am29050 microprocessor includes provisions for the external access of bytes, 
half-words, unaligned words, and unaligned half-words, as described in this section. 

3.4.1 Integer Data Types 

Most Am29050 microprocessor instructions deal directly with word-length integer 
data; integers may be either signed or unsigned, depending on the Instruction. Some 
instructions (e.g., AND) treat word-length operands as strings of bits. In addition, there 
is support for character, half-word, and Boolean data types. 
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Table 3-7 



Floating-Point Instructions 



Mnemonic 



Operation Description 



FADD 

DADD 

FSUB 

DSUB 

FMUL 

FDMUL 

DMUL 

FDIV 

DDIV 

FMAC 

DMAC 

FMSM 

DMSM 

MFACC 
MTACC 
FEQ 

DEQ 

FGE 

DGE 

FGT 



DEST (single-precision) 
DEST (double-precision) 
DEST (single-precision) 
DEST (double-precision) 
DEST (single-precision) 
DEST (double-precision) 
DEST (double-precision) 
DEST (single-precision) 
DEST (double-precision) 



<r- SRCA (single-precision) 
+ SRCB (single-precision) 

<-SRCA (double-precision) 
+ SRCB (double-precision) 

<-SRCA (double-precision) 
-SRCB (single-precision) 

<-SRCA (double-precision) 
-SRCB (double-precision) 

f- SRCA (single-precision) 

• SRCB (single-precision) 

<-SRCA (single-precision) 

• SRCB (single-precision) 

<- SRCA (double-precision) 

• SRCB (double-precision) 

<-SRCA (single-precision) 
/ SRCB (single-precision) 

f-SRCA (double-precision) 
/SRCB (double-precision) 



ACC(ACN) (variable-precision) <- SRCA (single-precision) 

• SRCB (single-precision) 

+ ACC(ACN) (variable precision) 

ACC(ACN) (double-precision) <- SRCA (double-precision) 

• SRCB (double-precision) 

+ ACC(ACN) (double-precision) 



DEST (single-precision) 



^SRCA (single-precision) 

• ACC(O) (single-precision) 
+ SRCB (single-precision) 

^SRCA (double-precision) 

• ACC(O) (double-precision) 
+ SRCB (double-precision) 



DEST (double-precision) 

DEST^ACC(ACN) 

ACC(ACN)^SRCA 

IF SRCA (single-precision) = SRCB (single-precision) 

THENDEST^TRUE 
ELSE DEST ^ FALSE 

IF SRCA (double-precision) = SRCB (double-precision) 

THEN DEST <- TRUE 
ELSE DEST f- FALSE 

IF SRCA (single-precision) >=SRCB (single-precision) 

THEN DEST ^ TRUE 
ELSE DEST «- FALSE 

IF SRCA (double-precision) >=SRCB (double-precision) 

THEN DEST ^ TRUE 
ELSE DEST f- FALSE 

IF SRCA (single-precision) > SRCB (single-precision) 

THEN DEST <- TRUE 
ELSE DEST <- FALSE 
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Table 3-7 



Floating-Point Instructions (continued) 



Mnemonic 



Operation Description 



DGT IF SRCA (double-precision) > SRCB (double-precision) 

THEN DESTf- TRUE 
ELSE DEST<- FALSE 

SQRT DEST (single-precision, double-precision) 

<- SQRT (SRCA (single-precision, double-precision)) 

CONVERT DEST (integer, single-precision, double-precision) 

<- SRCA (integer, single-precision, double-precision) 

CLASS DEST <- CLASS (SRCA (single-precision, double-precision)) 



Table 3-8 



Branch instructions 



l\/lnemonic 



Operation Description 



CALL 

CALLI 

JMP 

JMPI 

JMPT 

JMPTI 

JMPF 

JMPFI 

JMPFDEC 



DEST<-PC//00 + 8 
PC <- TARGET 
Execute delay Instruction 

DEST^PC//00 + 8 

PC <- SRCB 

Execute delay instruction 

PC ^TARGET 
Execute delay instruction 

PC <- SRCB 

Execute delay instruction 

IF SRCA = TRUE THEN PC <- TARGET 
Execute delay instruction 

IFSRCA = TRUETHENPCf-SRCB 
Execute delay instruction 

IF SRCA = FALSE THEN PC ^ TARGET 
Execute delay instruction 

IF SRCA = FALSE THEN PC <- SRCB 
Execute delay Instruction 

IF SRCA= FALSE THEN 

SRCA ^ SRCA- 1 

PC «- TARGET 
ELSE 

SRCA ^ SRCA -1 
Execute delay instruction 



Table 3-9 



Miscellaneous Instructions 



Mnemonic 



Operation Description 



Set IPA, IPB, and IPC with operand register numbers 

Load IPA and IPB with operand register numbers, and Trap (VN) 

Reset all Valid bits in Branch Target Cache memory to zeros 

Perform an interrupt return sequence 

Perform an interrupt return sequence, and reset all Valid bits in 



SETIP 

EMULATE 

INV 

IRET 

IRETINV 

HALT 



Branch Target Cache memory to zeros 
Enter Halt mode 
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3.4.1.1 BYTE OPERATIONS 

The processor supports character data through load, store, extraction and insertion 
operations on word-length operands, and by a compare operation on byte-length 
fields within words. The format for unsigned and signed characters is shown In 
Figure 3-41 ; for signed characters, the sign bit is the most-significant bit of the charac- 
ter. For sequences of packed characters within words, bytes are ordered either left- 
to-right or right-to-left, depending on the BO bit of the Configuration Register (see 
Section 3.4.5.2 ). 

Figure 3-41 Character Format 



Unsigned: 

31 23 15 


7 







1 


000000000000000000000000 


Data 


Signed: 


31 23 15 7 


1 




1 1 


ssssssssssssssssssssssss 


s 


Data 



If the Data Width Enable (DW) bit of the Configuration Register is 1, the Am29050 
microprocessor is enabled to load and store byte data. On a load, an external packed 
byte is converted to one of the character formats shown in Figure 3-41 . On a store, 
the low-order byte of a word is packed into every byte of an external word. Section 
3.4.6 describes external byte accesses in more detail. 

The Extract Byte (EXBYTE) instruction replaces the low-order character of a destina- 
tion word with an arbitrary byte-aligned character from a source word. For the EX- 
BYTE instruction, the destination word can be a zero word, which effectively zero- 
extends the character from the source operand. 

The Insert Byte (INBYTE) instruction replaces an arbitrary byte-aligned character in a 
destination word with the low-order character of a source word. For the INBYTE in- 
struction, the source operand can be a character constant specified by the instruction. 

The Compare Bytes (CPBYTE) instruction compares two word-length operands and 
gives a result of TRUE if any corresponding bytes within the operands have equiva- 
lent values. This allows programs to detect characters within words without first hav- 
ing to extract individual characters, one at a time, from the word of interest. 

3.4.1.2 HALF-WORD OPERATIONS 

The processor supports half-word data through load, store, insertion and extraction 
operations on word-length operands. The format for unsigned and signed half-words 
is shown in Figure 3-42; for signed half-words, the sign bit is the most-significant bit of 
the half-word. For sequences of packed half-words within words, half-words are or- 
dered either left-to-right or right-to-left, depending on the Byte Order (BO) bit of the 
Configuration Register (see Section 3.4.5.2). 

If the Data Width Enable (DW) bit of the Configuration Register is 1 , the Am29050 
microprocessor is enabled to load and store half-word data. On a load, an external 
packed half-word is converted to one of the formats shown in Figure 3-42. On a store, 
the low-order half-word of a word is packed into every half-word of an external word. 
Section 3.4.5 describes external half-word accesses in more detail. 
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Figure 3-42 Halff-Word Format 



Unsigned: 

31 23 


15 


7 


0000000000000000 


1 1 

Data 


Signed: 

31 23 15 7 


II II 1 

ssssssssssssssss 


s 


1 1 II 1 1 1 1 1 1 1 1 1 1 

Data 



The Extract Half-Word (EXHW) instruction replaces the low-order half-word of a desti- 
nation word with either the low-order or high-order half-word of a source word. For the 
EXHW instruction, the destination word can be a zero word, which effectively zero- 
extends the half-word from the source operand. 

The Extract Half-Word, Sign-Extended (EXHWS) instruction is similar to the EXHW 
instruction, except that it sign-extends the half-word in the destination word (i.e., it 
replaces the most-significant 16 bits of the destination word with the most-significant 
bit of the source half-word). 

The Insert Half-Word (INHW) instruction replaces either the low-order or high-order 
half-word in a destination word with the low-order half-word of a source word. 

3.4.1.3 BOOLEAN DATA 

Some instructions in the Compare class generate word-length Boolean results. Also, 
conditional branches are conditional upon Boolean operands. The Boolean format 
used by the processor is such that the Boolean values TRUE and FALSE are repre- 
sented by a 1 or 0, respectively, in the most-significant bit of a word. The remaining 
bits are unimportant: for the compare instructions, they are reset. Note that two's- 
complement negative integers are indicated by the Boolean value TRUE in this en- 
coding scheme. 

3.4.2 Floating-Point Data Types 

The Am29050 microprocessor supports single- and double-precision floating-point 
formats that comply with the IEEE Standard for Binary Floating-Point Arithmetic 
(ANSI/IEEE Std. 754-1985). 

In this section, the following nomenclature is used to denote fields in a floating-point 
value: 

• s: sign bit 

• bexp: biased exponent 

• sig: significand 
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3.4.2.1 SINGLE-PRECISION FLOATING-POINT 

The format for a single-precision floating-point value is shown in Figure 3-43. 

Figure 3-43 Single-Precision Floating-Point Format 

31 23 15 7 



Figure 3-44 



1 1 

s bexp 


II 1 III 

frac 



Typically, the value of a single-precision operand is expressed by: 

(-1)**s * Lfrac * 2**(bexp-127). 

The encoding of special floating-point values is given in Section 3.4.3. 

3.4.2.2 DOUBLE-PRECISION FLOATING-POINT 

The format for a double-precision floating-point value is shown in Figure 3-44. 



Double-Precision Floating-Point Format 

31 23 15 



1 1 

s bexp 


1 1 1 1 1 

frac... 


1 1 


Ill 1 

...frac 



Typically, the value of a double-precision operand is expressed by: 

(-1 )**s * 1 .frac * 2**(bexp-1 023). 

The encoding of special floating-point values is given in Section 3.4.3. 

In order to be properly referenced by a floating-point instruction, a double-precision 
floating-point value must be double-word aligned. The absolute-register number of the 
register containing the first word (labeled In Figure 3-44) must be even. The abso- 
lute-register number of the register containing the second word (labeled 1 in 
Figure 3-44) must be odd. If these conditions are not met, the results of the instruction 
are unpredictable. Note that the appropriate registers for a double-precision value in 
the local registers depends on the value of the Stack Pointer. 



3.4.3 Special Floating-Point Values 

The Am29050 microprocessor defines floating-point values which are encoded for 
special interpretation. The values are described in this section. 

3.4.3.1 NOT-A-NUMBER 

A Not-a-Number (NaN) is a symbolic value used to report certain floating-point excep- 
tions. It also can be used to implement user-defined extensions to floating-point op- 
erations. A NaN comprises a floating-point number with maximum biased exponent 
and non-zero fraction. The sign bit can be either or 1 , and has no significance. 
There are two types of NaN: signaling NaNs and quiet NaNs. A signaling NaN causes 
an Invalid Operation exception if used as an Input operand to a floating-point 
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operation; a quiet NaN does not cause an exception. Tlie Ann29050 microprocessor 
distinguishes signaling and quiet NaNs by the most-significant bit of the fraction: a 1 
indicates a quiet NaN, and a indicates a signaling NaN. 

An operation never generates a signaling NaN as a result. A quiet NaN result can be 
generated in one of two ways: 

• As the result of an invalid operation that cannot generate a reasonable result, or 

• As the result of an operation for which one or more input operands are either 
signaling or quiet NaNs. 

In either case, the Am29050 microprocessor produces a quiet NaN having a fraction 
of 11 000 ... 0; that is, the two most-significant bits of the fraction are 1 1 , and the 
remaining bits are 0. If desired, the Reserved Operand exception can be enabled to 
cause a Floating-Point Exception trap. The trap handler in this case can implement a 
scheme whereby user-defined NaN values appear to pass through operations as 
results, providing overall status for a series of operations. 

3.4.3.2 INFIKITY 

Infinity is an encoded value used to represent a value that is too large to be repre- 
sented as a finite number in a given floating-point format. Infinity comprises a floating- 
point number with maximum biased exponent and zero fraction. The sign bit of an 
infinity distinguishes +oo from -«>. 

3.4.3.3 DENORMALIZED NUMBERS 

The IEEE Standard specifies that, wherever possible, a result that is too small to be 
represented as a normalized number be represented as a denormalized number. A 
denormalized number may be used as an input operand to any operation. For single- 
and double-precision formats, a denormalized number is a floating-point number with 
a biased exponent of zero and a non-zero fraction field; the sign bit can be either 1 or 
0. The value of a denormalized number is expressed by: 

(-1)**s * O.frac * 2**(-bias+1), 

where bias is the exponent bias for the format in question (127 for single precision, 
1023 for double precision). The handling of denormalized numbers is discussed in 
Appendix C. 

3.4.3.4 ZERO 

A zero Is a floating-point number with a biased exponent of zero and a zero fraction 
field. The sign bit of a zero can be either or 1 ; however, positive and negative zero 
are both exactly zero, and are considered equal by comparison operations. 

3.4.4 External Data Accesses 

All processor external accesses occur between general-purpose registers and exter- 
nal devices and memories. Accesses occur as the result of the execution of load and 
store instructions. The load and store instructions specify which general-purpose 
register receives the data (for a load) or supplies the data (for a store). The format of 
the load and store instructions is shown in Figure 3-45. 
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Figure 3-45 



Load/Store Instruction Format 

31 23 



15 



XXXXXXXM 




1 

CNTL 


1 

RA 


RBorl 



CE 



Addresses for accesses are given either by the content of a general-purpose register 
or by a constant value specified by the load or store instruction. The load and store 
instructions do not perform address computation directly. Any required address com- 
putations are performed explicitly by other Instructions. 

In the load or store instruction, the Coprocessor Enable (CE) bit (bit 23) determines 
whether or not the access is directed to the coprocessor. If the CE bit is 0, the access 
is directed to an external device or memory. If the CE bit is 1 , data is transferred to or 
from the coprocessor. The CE bit affects the interpretation of the Control (CNTL) field 
as well as the channel protocol. Coprocessor accesses are discussed in Chapter 6. 
This section deals with all other external accesses. 

The format of the instructions that do not perform coprocessor data transfers (i.e., in 
which the CE bit is 0) is shown in Figure 3-46. 



Figure 3-46 



Non-Coprocessor Load/Store Formal 

31 23 


15 


7 


1 1 1 1 

XXXXXXXM 













1 1 

OPT 


1 

RA 


1 

RBori 



CE , PA • UA 



AS SB 



In load and store instructions, the RB or I field specifies the address for access. The 
address is either the content of a general-purpose register, with register number RB, 
or a constant with a value I (zero-extended to 32 bits). The M bit determines whether 
the register or the constant is used. 

The data for the access is written into the general-purpose register RA for a load, and 
is supplied by register RA for a store. 

The definitions for other fields in the load or store instruction are given below: 

Bit 23: Coprocessor Enable (CE) — The CE bit is for a non-coprocessor load or 
store. 

Bit 22: Address Space (AS)— -If the AS bit is for an untranslated load or store, the 
access is directed to Instruction/data memory. If the AS bit is 1 for an untranslated 
load or store, the access is directed to input/output. The AS bit must be for a trans- 
lated load or store; if the AS bit is 1 for a translated load or store, a Protection Viola- 
tion trap occurs. The address space for a translated load or store is determined by the 
Input/Output (10) bit of the associated TLB entry. 

Bit 21 : Physical Address (PA)— The PA bit may be used by a Supervisor-mode 
program to disable address translation for an access. If the PA bit is 1 , then address 
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translation is not performed for the access, regardless of the value of the Physical 
Addressing/Data (PD) bit in the Current Processor Status Register. If the PA bit is 0, 
address translation depends on the PD bit. 

The PA bit may be 1 only for Supervisor-mode instructions. If it is 1 for a User-mode 
Instruction, a Protection Violation trap occurs. 

Bit 20: Set Byte Pointer/Sign Bit (SB)— If the Data Width Enable (DW) bit of the 
Configuration Register Is and the SB bit is 1 , the Byte Pointer Register is written 
with the two least-significant bits of the address for the access. These address bits 
can control subsequent character and half-word operations. If the BP bit is 0, the Byte 
Pointer Register is not affected. 

If the Data Width Enable (DW) bit of the Configuration Register is 1 and the SB bit is 1 
for a load, the loaded byte or half-word is sign-extended in the destination register; If 
the SB bit is 0, the byte or half-word Is zero-extended. If the DW bit is 1 and the SB bit 
Is 1 for either a load or store, then each bit of the Byte Pointer Register Is written with 
the complement of the Byte Order bit of the Configuration Register. The Byte Pointer 
Register Is set in this case to provide software compatibility across different types of 
memory systems. If the SB bit Is 0, the Byte Pointer Register is not affected. 

Bit 19: User Access (UA) — ^The UA bit allows programs executing in the Supervisor 
mode to emulate User-mode accesses. This allows checking of the authorization of 
an access requested by a User-mode program. It also causes address translation (if 
applicable) to be performed using the PID field of the MMU Configuration Register, 
rather than the fixed Supervisor-mode process Identifier zero. 

If the UA bit is 1 for a Supervisor-mode load or store, the access associated with the 
instruction is performed in the User mode. In this case, the User mode affects only 
TLB protection-checking, the SUP/US output, and the use of the PID field in transla- 
tion; it has no effect on the registers that can be accessed by the instruction. If the UA 
bit Is 0, the program mode for the access is controlled by the SM bit. 

If the UA bit is 1 for a User-mode load or store, a Protection Violation trap occurs. 

Bits 18-16: Option (OPT)— This field is placed on the OPT(2-0) outputs during the 
address cycle of the access. There is a one-to-one correspondence between the OPT 
field and the OPT(2-0) outputs; that is, the most-significant OPT bit is placed on 
0PT2, and so on. 

The OPT field controls system functions as described below. 

Bits 15-8: (RA)— The data for the access is written into the general-purpose register 
RA for a load, and is supplied by register RA for a store. 

Bits 7-0: (RB or I)— In load and store instructions, the RB or I field specifies the 
address for the access. The address is either the content of a general-purpose regis- 
ter, with register number RB, or a constant value I (zero-extended to 32 bits). The 
M bit of the operation code (bit 24) determines whether the register or the constant 
Is used. 

Load and store operations are overlapped with the execution of instructions that 
follow the load or store instruction. Only one load or store may be in progress on any 
given cycle. If a load or store instruction is encountered while another load or store 
operation is in progress, the processor enters the Pipeline Hold mode until the first 
operation completes. However, the address for the second operation may appear on 
the Address Bus if the first operation is to a device or memory that supports pipelined 
operations (see Section 5.2.8). 
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3.4.4.1 LOAD OPERATIONS 

The processor provides the following instructions for performing load operations: Load 
(LOAD), Load and Lock (LOADL), Load and Set (LOADSET), and Load Multiple 
(LOADM). All of these Instructions transfer data from an external device or memory 
into one or more general-purpose registers. 

The LOADL instruction supports the implementati on of d evice and memory interlocks 
in a multi-processor configuration. It activates the LOCK output during the address 
cycle of the access. 

The LOADSET instruction Implements a binary semaphore. It loads a general- 
purpose register and atomlcally writes the accessed location with a w ord whi ch has 1 
in every bit position (that Is, the write Is indivisible from the read). The LOCK output is 
asserted during both the read and write access. Note that, if address translation is 
enabled for the LOADSET instruction, the TLB memory-protection bits must allow 
both the read and write access. If either the read or write access is not allowed, nei- 
ther access is performed. 

The LOADM loads a specified number of registers from sequential addresses, as 
explained below. 

Load operations are overlapped with the execution of instructions that follow the load 
instruction. The processor detects any dependencies on the loaded data that subse- 
quent instructions may have, and, if such a dependency is detected, enters the Pipe- 
line Hold mode until the data is returned by the external device or memory. If a regis- 
ter that Is the target of an incomplete load is written with the result of a subsequent 
instruction, the processor does not write the returning data into the register when the 
load completes; the Not Needed (NN) bit in the Channel Control Register is set in 
this case. 

Whenever possible, the Am29050 microprocessor performs an early load, making the 
physical address available at the end of the decode cycle of the load instruction. At 
the beginning of the next cycle, when the load enters the execute stage, the physical 
address appears on the channel. Early loads reduce the effective external access 
time by one cycle. The hardware that supports early loads is discussed in Section 4.3. 

3.4.4.2 STORE OPERATIONS 

The processor provides the following instructions for performing store operations: 
Store (STORE), Store and Lock (STOREL), and Store Multiple (STOREM). All of 
these instructions transfer data from one or more general-purpose registers to an 
external device or memory. 

The STOREL Instruction supports the implementation o f devic e and memory Inter- 
locks in a multi-processor configuration. It activates the LOCK output during the ad- 
dress cycle of the access. 

The STOREM instruction stores a specified number of registers to sequential ad- 
dresses, as explained below. 

Store operations are overlapped with the execution of instructions that follow the store 
instruction. However, no data dependencies can exist, since the store prevents any 
subsequent accesses until it completes. 

3.4.4.3 MULTIPLE ACCESSES 

Load Multiple (LOADM) and Store Multiple (STOREM) instructions move contiguous 
words of data between general-purpose registers and external devices and memo- 
ries. The number of transfers is determined by the Load/Store Count Remaining 
Register. 
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The Load/Store Count Remaining (CR) field in tiie Load/Store Count Remaining 
Register specifies tlie number of transfers to be performed by tiie next LOADI\/I or 
STOREM executed in the instruction sequence. The CR field is in the range of to 
255, and is zero-based: a count value of represents one transfer, and a count value 
of 255 represents 256 transfers. The CR field also appears in the Channel Control 
Register. 

Before a LOADM or STOREM is executed, the CR field is set by a Move To Special 
Register. A LOADM or STOREM uses the most-recently written value of the CR field. 
If an attempt is made to alter the CR field, and the Channel Control Register contains 
information for an external access that has not yet completed, the processor enters 
the Pipeline Hold mode until the access completes. Note that since the CR is set 
independently of the LOADM and STOREM, the CR field may represent valid state of 
an interrupted program even if the Contents Valid (CV) bit of the Channel Control 
Register is 0. 

Because of the pipelined implementation of LOADM and STOREM, at least one in- 
struction (e.g., the instruction that sets the CR field) must separate two successive 
LOADM and/or STOREM instructions. 

After the CR field is set, the execution of a LOADM or STOREM begins the data 
transfer. As with any other load or store operation, the LOADM or STOREM waits until 
any pending load or store operation is complete before starting. The LOADM instruc- 
tion specifies the starting address and starting destination general-purpose register. 
The STOREM instruction specifies the starting address and the starting source 
general-purpose register. 

During the execution of the LOADM or STOREM instruction, the processor updates 
the address and register number after every access, incrementing the address by four 
and the register number by 1 . This continues until either all accesses are completed 
or an interrupt or trap is taken. 

For a load-multiple or store-multiple address sequence, addresses wrap from the 
largest possible value (hexadecimal FFFFFFFC) to the smallest possible value (hexa- 
decimal 00000000). 

The processor increments absolute register numbers during the load-multiple or 
store-multiple sequence. Absolute-register numbers wrap from 127 to 128, and from 
255 to 128. Thus, a sequence that begins in the global registers may transition to the 
local registers, but a sequence that begins in the local registers remains in the local 
registers. Also, note that the local registers are addressed circularly. 

The normal restrictions on register accesses apply for the load-multiple and store- 
multiple sequences. For example, if a protected general-purpose register is encoun- 
tered in the sequence for a User-mode program, a Protection Violation trap occurs. 

Intermediate addresses are stored in the Channel Address Register, and register 
numbers are stored in the Target Register (TR) field of the Channel Control Register. 
For the STOREM instruction, the data for every access is stored in the Channel Data 
Register (this register also is set during the execution of the LOADM instruction, but 
has no interpretation in this case). The CR field is updated on the completion of every 
access, so that it indicates the number of accesses remaining in the sequence. 

Load-multiple and store-multiple operations are indicated by the Multiple Operation 
(ML) bit in the Channel Control Register. The ML bit is used to restart a multiple op- 
eration on an interrupt return; if it is set independently by a Move To Special Register 
before a load or store instruction is executed, the results are unpredictable. 

While a multiple load or store is executing, the processor is in the Pipeline Hold mode, 
suspending any subsequent instruction execution until the multiple access completes. 
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If an interrupt or trap is taken, tlie Channel Address, Channel Data, and Channel 
Control registers contain the state of the multiple access at the point of interruption. 
The nnultiple access may be resumed at this point, at a later time, by an interrupt 
return. 

The processor attempts to complete multiple accesses using the burst-mode capabil- 
ity of the channel (see Section 5.2.9). For this reason, multiple accesses of individual 
bytes and half-words Is not supported. If the burst-mode access is preempted, the 
processor retransmits the address at the point of preemption. If the external device or 
memory cannot support burst-mode accesses, the processor transmits an address for 
every access. If the address sequence causes a virtual page-boundary crossing, the 
processor preempts the burst-mode access, translates the address for the new page, 
and re-establishes the burst-mode access using the new physical address. 



3.4.4.4 OPTION BITS 



The Option field In the load and store Instructions supports system functions, such as 
byte and half-word accesses. The definition of this field for a load or store, depending 
on the AS bit of the instruction, Is as follows: 

AS 0PT2 0PT1 OPTO Meaning 



X 











Word-length access 


X 








1 


Byte access 


X 





1 





Half-word access 





1 








Instruction ROM access (as data) 





1 





1 


Cache control 





1 
—All Others — 


1 



Reserved 


Hardware-development system accesses 



Note that some of these encodings do not affect processor operation, and could have 
other Interpretations in a particular system. For example, the OPT values 000, 001, 
and 010 affect processor operation only if the DW bit of the Configuration Register is 
1 . However, non-standard uses of the OPT field have an Implication on the portability 
of software between different systems. 

3.4.5 Addressing and Alignment 

3.4.5.1 ADDRESS SPACES 

External instructions and data are contained In one of five 32-bit address-spaces: 

1 . Data Memory 

2. Input/Output 

3. Coprocessor 

4. Instruction Read-Only Memory (Instruction ROM) 

5. Instruction Random-Access Memory (Instruction RAM) 

An address in the instruction/data memory address space may be treated as virtual or 
physical, as determined by the Current Processor Status Register. Address transla- 
tion for data accesses is enabled separately from address translation for instruction 
accesses. A program in the Supervisor mode may temporarily disable address trans- 
lation for individual loads and stores; this permits load-real and store-real operations. 

It is possible to partition physical instruction and data addresses into two separate 
physical address spaces. However, virtual instruction and data addresses appear in 
the same virtual address space (i.e., instruction/data memory). 
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3.4.5.2 



The coprocessor address space is not an address space in tlie strictest sense. Tlie 
coprocessor address space is defined so that transfers of operands and operation 
codes to the coprocessor do not interfere with other external devices and memories. 

The processor does not directly support the access of the instruction ROM or instruc- 
tion RAM address spaces using loads and stores; this capability is defined as a sys- 
tem option requiring external hardware. 

For untranslated data accesses, bits contained in load and store instructions distin- 
guish between the instruction/data memory, input/output, and coprocessor address 
spaces. For translated data accesses, the Input/Output bit of the associated TLB entry 
distinguishes between the Instruction/data memory and input/output address spaces. 

For instruction fetches, the ROM Enable (RE) bit of the Current Processor Status 
Register distinguishes between the Instruction/data and instruction ROM address 
spaces. 

BYTE AND HALF-WORD ADDRESSING 

The Am29050 microprocessor generates word-oriented byte addresses for accesses 
to external devices and memories. Addresses are word-oriented because loads, 
stores, and instruction fetches access words. However, addresses are byte addresses 
because they are sufficient to select bytes packed within accessed words. For load 
and store operations, the processor provides means for using the least-significant 
address bits to access bytes and half-words within external words. 

The selection of a byte within a word is determined by the two least-significant bits of 
an address and the Byte Order (BO) bit of the Configuration Register. The selection of 
a half-word within a word is determined by the next-to-least significant bit of an ad- 
dress and the BO bit. Figure 3-47 illustrates the addressing of bytes and half-words 
when the BO bit is (big endian), and Figure 3-48 illustrates the addressing of bytes 
and half-words when the BO bit is 1 (little endian). In Figure 3-47 and Figure 3-48, 
addresses are represented in hexadecimal notation. 



Figure 3-47 Byte and HIalf-Word Addressing witii BO = (Big Endian) 

31 23 15 7 



TTTT 



Half-Word 00000000 
Byte 00000000 Byte 00000001 



Word 00000000 



Half-Word 00000002 
Byte 00000002 Byte 00000003 



Word 000000004 
Half-Word 00000004 Half-Word 00000006 



Byte 00000004 



Byte 00000005 



Byte 00000006 



Byte 00000007 



I I I I I i I M I I M I I I I I I M I I M I M M 

Wnrri FFp'-'^FFS 



VWIU I I 



Half-Word FFFFFFF8 Half-Word FFFFFFFA 

Byte FFFFFFF8 Byte FFFFFFF9 Byte FFFFFFFA Byte FFFFFFFB 



I I I M I I I I II I I M I I I I I I I I I I I I I I I I 

Word FFFFFFFC 

Half-Word FFFFFFFE 

Byte FFFFFFFE Byte FFFFFFFF 



Half-Word FFFFFFFC 
Byte FFFFFFFC Byte FFFFFFFD 
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Figure 3-48 Byte and Half-Word Addressing witii BO = 1 (Little Endian) 

31 23 15 7 



1 1 1 1 1 1 1 1 1 1 1 1 

Word 00000000 
Half-Word 00000002 Half-Word 00000000 

Byte 00000003 Byte 00000002 Byte 00000001 Byte 00000000 


1 1 1 1 1 

Word 000000004 
Half-Word 00000006 Half-Word 00000004 

Byte 00000007 Byte 00000006 Byte 00000005 Byte 00000004 



1 1 1 1 

Word FFFFFFF8 
Half-Word FFFFFFFA Half-Word 

Byte FFFFFFFB Byte FFFFFFFA Byte FFFFFFF9 


FFFFFFF8 


Byte FFFFFFF8 


II II 

Word FFFFFFFC 
Half-Word FFFFFFFE Half-Word 

Byte FFFFFFFF Byte FFFFFFFE Byte FFFFFFFD 


1 


FFFFFFFC 


Byte FFFFFFFC 



In the processor, the two least-significant bits of an external address can be reflected 
In the Byte Pointer (BP) field of the ALU Status Register when the DW bit of the Con- 
figuration Register is 0. Alternatively, the two least-significant bits of the address can 
be used to control byte and half-word accesses when the DW bit is 1 . The BO bit 
affects only the interpretation of the BP field and the two least-significant address bits. 

If the BO bit is 0, bytes are ordered within words such that a 00 in the BP field or In 
the two least-significant address bits selects the high-order byte of a word, and a 1 1 
selects the low-order byte. If the BO bit is 1 , a 00 in the BP field or in the two least- 
significant address bits selects the low-order byte of a word, and a 1 1 selects the 
high-order byte. 

If the BO bit is 0, half-words are ordered within words such that a in the most- 
significant bit of the BP field or the next-to-least-significant address bit selects the 
high-order half-word, and a 1 selects the low-order half-word. If the BO bit is 1, a in 
the most-significant bit of the BP field or the next-to-least-significant address bit se- 
lects the low-order half-word of a word, and a 1 selects the high-order half-word. Note 
that since the least-significant bit of the BP field or an address does not participate In 
the selection of half-words, the alignment of half-words is forced to half-word bounda- 
ries in this case. 

3.4.5.3 ALIGNIMENT OF WORDS AND HALF-WORDS 

Since only byte addressing is supported, it is possible that an address for the access 
of a word or half-word is not aligned to the desired word or half-word. The Am29050 
microprocessor either ignores or forces alignment In most cases. However, some 
systems may require that unaligned accesses be supported, for compatibility reasons. 
Because of this, the Am29050 microprocessor provides an option that creates a trap 
when a non-aligned access is attempted. This trap allows software emulation of the 
non-aligned accesses. In a manner which is appropriate for the particular system. 

The detection of unaligned accesses is activated by a 1 in the Trap Unaligned Access 
(TU) bit of the Current Processor Status Register. Unaligned-access detection is 
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based on the data length as indicated by the OPT field of a load or store instruction, 
and on the two least-significant bits of the specified address. Only addresses for 
instruction/data memory accesses are checl<ed; alignment is ignored for input/output 
accesses and coprocessor transfers. 

An Unaligned Access trap occurs only if the TU bit is 1 and any of the following com- 
binations of OPT field and address bits is detected for a load or store to instruction/ 
data memory: 

0PT2 0PT1 OPTO A1 AG Meaning 

1 r Unaligned 1 

1 Word 

1 1 L Access J 






1 











1 











1 



10 1 r Unaligned Half-"] 

10 11 L Word Access J 

The trap handler for the Unaligned Access trap is responsible for generating the 
correct sequence of aligned accesses and performing any necessary shifting, mask- 
ing and/or merging. Note that a virtual page-boundary crossing also may have to be 
considered. 

3.4.5.4 ALIGNMENT OF INSTRUCTIONS 

In the Am29050 microprocessor, all instructions are 32 bits in length, and are aligned 
on word-address boundaries. The processor's Program Counter is 30 bits in length, 
and the least-significant two bits of processor-generated instruction addresses are 
always 00. An unaligned address can be generated by indirect jumps and calls. How- 
ever, alignment is ignored by the processor in this case, and it expects the system to 
force alignment (i.e., by interpreting the two least-significant address bits as 00, re- 
gardless of their values). 

3.4.5.5 ACCESSING INSTRUCTIONS AS DATA 

To aid the external access of instructions and data on separate buses, the processor 
distinguishes between instruction and data accesses. However, it does not support a 
logical distinction between instruction and data address spaces (except In the case of 
instruction read-only memory). In particular, address translation in the Memory Man- 
agement Unit is in no way affected by this distinction (although memory protection is). 

In systems where it Is necessary to access instructions as data, this function should 
be performed via the shared address space. The OPT field provides a means for 
loads to access instructions in the instruction read-only memory (ROM) address 
space. The Am29050 microprocessor does not take any action to prevent a store to 
the instruction ROM address space. 

3.4.6 Byte and Haif-Word Accesses 

The Am29050 microprocessor can perform byte and half-word accesses in either 
software or hardware, under control of the Data Width Enable (DW) bit of the Configu- 
ration Register. Software byte and half-word accesses are selected by a DW bit of 0, 
and hardware byte and half-word accesses are selected by a DW bit of 1 . Software 
byte and half-word accesses are less efficient than hardware byte and half-word 
accesses, but hardware accesses require that the system be able to selectively write 
individual byte and half-word positions within external devices and memories. The 
software-only technique is compatible with systems designed to provide hardware 
support for byte and half-word accesses. 
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This section describes the operation of both software and hardware byte and half- 
word accesses. Byte and half-word accesses operate as described here for memory 
and Input/output accesses, but not for coprocessor transfers. Coprocessor transfers 
are unaffected by the DW bit. 

The DW bit Is cleared by a processor reset. It must explicitly be set to 1 by software 
before hardware byte and half-word accesses can be performed. 

3.4.6.1 SOFTWARE BYTE AND HALF-WORD ACCESSES 

If the DW bit is 0, the Am29050 microprocessor allows the Byte Pointer Register to be 
set with the least-significant bits of an address specified by any load or store instruc- 
tion, except those that transfer information to and from the coprocessor. Insert and 
extract instructions can then be used to access the byte or half-word of interest, after 
the external word has been accessed. This provides a general-purpose mechanism 
for manipulating external byte and half-word data, without the need for external hard- 
ware support. 

To load a byte or half-word, a word load first Is performed. This load sets the BP field 
with the two least-significant bits of the address. A subsequent EXBYTE, EXHW or 
EXHWS instruction extracts the byte or half-word of interest from the accessed word. 

To store a byte or half-word, a load is first performed, setting the BP field with the two 
least-significant bits of the address. A subsequent INBYTE or INHW instruction Inserts 
the byte or half-word of interest into the accessed word, and the resulting word then Is 
stored. 

Software which relies on loads and stores setting the BP field cannot operate cor- 
rectly when the Freeze (FZ) bit of the Current Processor Status Register is 1 , because 
the ALU Status Register Is frozen. 

3.4.6.2 HARDWARE BYTE AND HALF-WORD ACCESSES 

If the DW bit is 1 on a load, the Am29050 microprocessor selects a byte or half-word 
from the loaded word depending on: the Option (OPT) bits of the load instruction, the 
Byte Order (BO) bit of the Configuration Register, and the two least-significant bits of 
the address (for bytes) or the next-to-least-significant bit of the address (for half- 
words). The selected byte or half-word is right-justified within the destination register. 
If the SB bit of the load instruction is 0, the remainder of the destination register is 
zero-extended. If the SB bit is 1 , the remainder of the destination register is sign-ex- 
tended with the sign bit of the selected byte or half-word. 

If the DW bit Is 1 on a store, the Am29050 microprocessor replicates the low-order 
byte or half-word in the source register into every byte and half-word position of the 
stored word. The system is responsible for generating the appropriate byte and/or 
half-word strobes, based on the OPT(2~0) signals and the two least-significant bits of 
the address, to write the appropriate byte or half-word in the selected device or mem- 
ory (the system byte order must also be considered). The SB bit does not affect the 
operation of a store, except for setting the BP field as described below. 

If the SB bit is 1 for either a load or store, and the DW bit is also 1 , both bits of the BP 
field are set to the complement of the BO bit when the load or store Is executed. This 
does not directly affect the load or store access, but supports compatibility for soft- 
ware developed for word-write-only systems. Hardware byte and half-word accesses 
(in contrast to software byte and half-word accesses) can be performed when the FZ 
bit is 1 , because these accesses do not rely on the BP field. 
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3.4.6.3 SYSTEM ALTERNATIVES AND COMPATIBILITY 

The two mechanisms for performing byte and half-word accesses create the possibil- 
ity of two types of systems. These are named for convenience: 

• Type 1 : simple, word-only accesses in external devices and memories; software 
byte and half-word accesses. 

• Type 2: byte/half-word strobes in external devices and memories; hardware byte 
and half-word accesses by the Am29050 microprocessor. 

The provision for hardware byte and half-word accesses encourages Type 2 systems. 
Software for Type 1 systems can execute on Type 2 systems, but the reverse is not 
true. Software compatibility is possible primarily because of the DW bit and because 
the Am29050 microprocessor sets the BP field with an appropriate byte pointer even 
when it performs byte and half-word accesses with internal hardware. Also, the sys- 
tem must return a full word in either type of system, regardless of the access data- 
width. The DW bit must be in Type 1 systems and must be 1 in Type 2 systems. To 
illustrate compatibility between systems, consider the following steps of an unsigned 
byte load compiled for a Type 1 system, but executing on a Type 2 system: 

Perform a load with OPT = 001 and SB = 1. 

• Type 1 system: The addressed word is accessed and placed into the destination 
register. The BP field is set with the two least-significant bits of the address. 

® Type 2 system: The addressed byte is accessed, aligned, padded, and placed into 
the destination register. The BP field is set to point to the low-order byte, reflecting 
the alignment that has been performed (the pointer depends on the value of the 
BO bit). 

Perform a byte extract on the loaded word. 

® Type 1 system: The byte selected by the BP field is aligned to the low-order byte 
of the destination register and the remainder of the word is zero-extended. The 
selected byte may be in any byte position. 

o Type 2 system: The byte selected by the BP field (set to point to the low-order byte) 
is aligned to the low-order byte of the destination register and the remainder of the 
word is zero-extended. (Note that the selected byte was already in the low-order 
byte position. This operation does not change program state but merely allows 
software compatibility.) 

The recommended instruction sequences for all types of byte and half-word accesses 
and for both types of systems are enumerated below. Compatibility between these 
systems follows the above example, but for brevity, compatibility is not described in 
detail here. 

Byte Read, Unsigned: 

Type 1 Comments 

load 0,17,temp,addr; ; OPT=001, SB=1 

exbyte temp,temp,0 ; get byte 

Type 2 Comments 

load 0,1, temp.addr; ; OPT=001 , SB=0 
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Byte Read, Signed: 




Typel 


Comments 


load 0,1 7,temp,addr 
exbyte temp.temp.O 
sll temp,temp,24 
sra temp,temp,24 


:OPT=001,SB=1 
; get byte 
; sign extend 


Type 2 


Comments 


load 0,1 7,temp,addr 


; OPT=001, SB=1 (sign extended) 


Byte Write: 




Typel 


Comments 


load 0,1 7,temp,addr 
inbyte temp.temp.data 
store 0,1, temp.addr 


;OPT=001,SB=1 
; insert byte 
; store 


Type 2 


Comments 


store 0,1, data,addr 


;OPT=001,SB=0 


Half-Word Read, Unsigned: 




Typel 


Comments 



load 0,1 8,temp,addr 
exhw temp,temp,0 

Type 2 

load 0,2,temp,addr 



;OPT=010,SB=1 

; get half-word unsigned 

Comments 

;OPT=010.SB=0 



Half-word Read, Signed: 
Typel 

load 0,1 8,temp,addr 
exhws temp,temp 

Type 2 

load 0,1 8,temp,addr 

Half-Word Write: 
Typel 

load 0,1 8,temp,addr 
inhw temp,temp,data 
store 0,2,temp,addr 

Type 2 

store 0,2,data,addr 



Comments 

;OPT=010,SB=1 

; get half-word sign-extend 

Comments 

; OPT=010, SB=1, (sign-extend) 



Comments 

;OPT=010,SB=1 
; insert half-word 
; store 

Comments 

;OPT=010, SB=0 
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3.5 INTERRUPTS AND TRAPS 

Interrupts and traps cause the Am29050 microprocessor to suspend the execution of 
an instruction sequence and to begin the execution of a new sequence. The proces- 
sor may or may not later resume the execution of the original instruction sequence. 

The distinction between interrupts and traps is largely one of causation and enabling. 
Interrupts allow external devices and the Timer Facility to control processor execution, 
and are always asynchronous to program execution. Traps are intended to be used 
for certain exceptional events that occur during instruction execution, and are gener- 
ally synchronous to program execution. 

Throughout this manual, a distinction is made between the point at which an interrupt 
or trap occurs and the point at which it is taken. An interrupt or trap is said to occur 
when all conditions that define the interrupt or trap are met. However, an interrupt or 
trap that occurs is not necessarily recognized by the processor, either because of 
various enables, or because of the processor's operational mode (e.g., Halt mode). 
An interrupt or trap is taken when the processor recognizes the interrupt or trap and 
alters its behavior accordingly. 

3.5.1 Interrupts 



Interrupts are caused by signals applied to any of the external inputs INTR(3-0), or by 
the Timer Facility (see Section 7.3.6). The processor may be disabled from taking 
certain interrupts by the masking capability provided by the Disable All Interrupts and 
Traps (DA) bit. Disable Interrupts (Dl) bit, and Interrupt Mask (IM) field in the Current 
Processor Status Register. 

The DA bit disables all interrupts. The Dl bit disables external interrupts without affect- 
ing the recognition of traps and Timer interrupts. The 2-bit IM field selectively enables 
external interrupts as follows: 



IM Value Result 

00 TnTr o enabled 

01 INTR(1-0) enabled 

10 INTR(2-0) enabled 

1 1 INTR(3-0) enabled 



Note that the INTRO interrupt cannot be disabled by the IM field. Also, note that no 
external interrupt Is taken If either the DA or Dl bit is 1 . The Interru pt Pen ding bit in the 
Current Processor Status indicates that one or more of the signals INTR(3-0) is ac- 
tive, but that the corresponding interrupt is disabled due to the value of either DA, Dl, 
or IM. 



3.5.2 Traps 



Traps are caused by signals applied to one of the inputs TRAP(I-O), or by exceptional 
conditions such as protection violations. Except for the Instruction Access Exception, 
Data Access Exception, and Coprocessor Exception traps, traps are disabled by the 
DA bit in the Current Processor Status; a 1 in the DA bit disables traps, and a en- 
ables traps. It is not possi ble to selectively disable individual traps. If a trap occurs 
(except a trap caused by TRAP(1-0)) and the DA bit is 1 , the processor enters the 
Monitor mode via a Monitor trap (see Section 3.5.7). 
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3.5.3 



Wait Mode 



A wait-for-interrupt capability is provided by the Walt mode. The processor Is In the 
Wait mode whenever the Wait Mode (WIVI) bit of the Current Processor Status is 1 . 
While in Wait mode, the processor neither fetches nor executes instructions, and 
performs no external accesses. The Walt mode is exited when an interrupt or trap is 
taken. 

Note that the processor can take only those interrupts or traps for which It Is enabled, 
even in the Wait mode. For example, if the processor is in the Walt mode with a DA 
bit of 1 , it can leave the Walt mode only via the Reset mode (see Section 3.9) or a 
WARN trap (see Section 3.5.6). 

3.5.4 Vector Area 

Interrupt and trap processing relies on the existence of a user-managed Vector Area 
In external instruction/data memory or Instruction read-only memory (Instruction 
ROM). The Vector Area begins at an address specified by the Vector Area Base 
Address Register, and provides for as many as 256 different interrupt and trap 
handling routines. The processor reserves 64 routines for system operation and 
instruction emulation. The number and definition of the remaining 192 possible rou- 
tines are system-dependent. 

The Vector Area has one of two possible structures as determined by the Vector 
Fetch (VF) bit In the Configuration Register. The first structure, as described below, 
requires less external memory than the second, but imposes the performance penalty 
of the vector-table lookup. 

If the VF bit Is 1 , the structure of the Vector Area is a table of vectors in instruction/ 
data memory. The layout of a single vector Is shown In Figure 3-49. Each vector gives 
the beginning word-address of the associated interrupt or trap handling routine, and 
specifies, by the R bit, whether the routine is contained in instruction/data memory 
(R = 0) or instruction ROM (R = 1 ). 



Figure 3-49 Vector Table Entry 

31 23 15 7 

I I I I I M I I I I I I I I I I I I I M M I M I I 



Handler Starting Address 



If the VF bit is 0, the structure of the Vector Area is a segment of contiguous blocks of 
instructions In instruction/data memory or instruction ROM. The ROM Vector Area 
(RV) bit of the Configuration Register determines whether the Vector Area is In In- 
struction/data memory (RV = 0) or Instruction ROM (RV = 1). A 64-instruction block 
contains exactly one interrupt or trap handling routine, and blocks are aligned on 
64-instructlon address boundaries. 

3.5.4.1 VECTOR NUMBERS 

When an interrupt or trap is taken, the processor determines an 8-bit vector number 
associated with the interrupt or trap. The vector number gives either the number of a 
vector table entry or the number of an instruction block, depending on the value of the 
VF bit. If the VF bit is 1 , the physical address of the vector table entry Is generated by 
replacing bits 9-2 of the value in the Vector Area Base Address Register with the 
vector number. If the VF bit is 0, the physical address of the first Instruction of the 
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handling routine is generated by replacing bits 15-8 of the value in the Vector Table 
Base Address Register with the vector number. 

Vector numbers are either pre-defined, or specified by an instruction causing the trap. 
The assignment of vector numbers Is shown in Table 3-1 (vector numbers are in 
decimal notation). Vector numbers 64 to 255 are for use by trapping instructions; the 
definition of the routines associated with these numbers is system-dependent. 

3.5.5 Interrupt and Trap Handling 

Interrupt and trap handling consists of two distinct operations: taking the interrupt or 
trap, and returning from the interrupt or trap handler. If the interrupt or trap handler 
returns directly to the Interrupted routine, the interrupt or trap handler need not save 
and restore processor state. 

3.5.5.1 TAKING AN INTERRUPT OR TRAP 

The following operations are performed in sequence by the processor when an inter- 
rupt or trap is taken. 

1 . Instruction execution is suspended. 

2. Instruction fetching is suspended. 

3. Any in-progress load or store operation Is completed. Any additional operations 
are canceled in the case of load multiple and store multiple. 

4. The contents of the Current Processor Status Register are copied into the Old 
Processor Status Register. 

5. The Current Processor Status register is modified as shown in Figure 3-50 (the 
value u means unaffected, and the MM bit Is set only if the trap causes the 
processor to enter the Monitor mode). Note that setting the Freeze (FZ) bit freezes 
the Channel Address, Channel Data, Channel Control, Program Counter 0, 
Program Counter 1, Program Counter 2, and ALU Status Registers. 

6. The address of the first instruction of the interrupt orirap handler is determined. If 
the VF bit of the Configuration Register is 1 , the address is obtained by accessing 
a vector from instruction/data memory, using the physical address obtained from 
the Vector Area Base Address Register and the vector number. This access 
appears on the channel as a data access, and the OPT{2-0) signals Indicate a 
word-length access. If the VF bit is 0, the instruction address is given directly by 
the Vector Area Base Address Register and the vector number. 

7. If the VF bit is 1 , the R bit in the vector fetched in step 6 is copied into the RE bit of 
the Current Processor Status Register. If the VF bit Is 0, the RV bit of the 
Configuration Register is copied into the RE bit. This step determines whether or 
not the first instruction of the interrupt handler is in instruction ROM. 

8. An instruction fetch is initiated using the instruction address determined In step 6. 
At this point, normal instruction execution resumes. 

Note that the processor does not explicitly save the contents of any registers when an 
interrupt is taken. If register saving is required, it is the responsibility of the interrupt- 
er trap-handling routine. For proper operation, registers must be saved before any 
further Interrupts or traps may be taken. The FZ bit must be reset at least two instruc- 
tions before interrupts or traps are re-enabled, to allow program state to be reflected 
properly in processor registers If an interrupt or trap is taken. 
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Table 3-1 Vector Number Assignments 


Number 


Type of Trap or Interrupt 


Cause 





Illegal Opcode 


Executing undefined Instruction* 


1 


Unaligned Access 


Access on unnatural boundary, TU = 1 


2 


Out of Range 


Overflow or underflow 


3 


Coprocessor Not Present 


Coprocessor access. CP = 


4 


Coprocessor Exception 


Coprocessor DERR response 


5 


Protection Violation 


Invalid User-mode operation 


6 


Instruction Access Exception 


1 ERR response 


7 


Data Access Exception 


DERR response, not coprocessor 


8 


User-Mode Instruction TLB Miss 


No TLB entry for translation 


9 


User-Mode Data TLB Miss 


No TLB entry for translation 


10 


Supervisor-Mode Instruction TLB Miss 


No TLB entry for translation 


11 


Supervisor-Mode Data TLB Miss 


No TLB entry for translation 


12 


Instruction MMU Protection Violation 


TLBorRMUUE/SE = 


13 


Data MMU Protection Violation 


TLBorRMUUR/SR = 0, UW/SW = Oon 
write 


14 


Timer 


Timer Facility 


15 


Trace 


Trace Facility, breakpoint 
comparisons 


16 


InTro 


INTRO input 


17 


INTR1 


INTR1 input 


18 


INTR2 


INTR2 input 


19 


INTR3 


INTR3 input 


20 


TRAPO 


TRAPO input 


21 


TRAP1 


TRAP1 input 


22 


Floating-Point Exception 


Unmasked floating-point exception 


23 


Resen/ed 




24 


FMAC exception 


ACF in FPE Register =00 or 11 


25 


DMAC exception 


ACFinFPERegister=00or11 


26-27 


Resented 




28 


Resen/ed for instruction emulation 
(opcodes BF, CF-D6, DC) 




29 


Reserved for instruction emulation 
(opcode DD) 




30-32 


Resented 




33 


DIVIDE 


DIVIDE instruction 


34 


Resen/ed 




35 


DIVIDU 


DIVIDU instruction 


36 


CONVERT exception 


FS = 00or11orFD = 00or11 


37 


SORT exception 


FS = 00or11 


38 


CLASS exception 


FS=:00or11 


39 


Resen/ed for instruction emulation 
(opcode E7) 




40 


MTACC exception 


FMT = 11 or FMT=00 and ACF = 00 or 1 1 


41 


MFACC exception 


FMT = 1 1 or FMT = 00 and ACF = 00 or 1 1 


42-55 


Reserved 




56 


Resen/ed for instruction emulation 
(opcode F8) 




57 


Resen/ed 




58-63 


Resen/ed for Instruction emulation 
(opcode FA-FF) 




64-255 


ASSERT and EMULATE instruction traps 
(vector number specified by instruction) 




*This vector number also results if an external device removes INTR(3-0) or TRAP(I-O) before the corresponding interrupt 


or trap is taken by the processor. 
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Figure 3-50 Current Processor Status After an Interrupt or Trap 
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3.5.5.2 RETURNING FROM AN INTERRUPT OR TRAP 

Two instructions are used to resume tine execution of an interrupted program: Inter- 
rupt Return (IRET), and Interrupt Return and Invalidate (IRETINV). These instructions 
are Identical except In one respect: the IRETINV instruction resets all Valid bits in the 
Branch Target Cache memory, whereas the IRET instruction does not affect the Valid 
bits. 

In some situations, the processor state must be set properly by software before the 
interrupt return is executed. The following is a list of operations normally performed in 
such cases: 

1 . The Current Processor Status is configured as shown in Figure 3-51 (the value x 
is a 6on't care and the value u means unaffected). Note that setting the FZ bit 
freezes the registers listed below so that they may be set for the interrupt return. 

2. The Old Processor Status is set to the value of the Current Processor Status for 
the target routine. 

3. The Channel Address, Channel Data, and Channel Control registers are set to 
restart or resume uncompleted channel operations of the target routine. 

4. The Program Counter 1 and Program Counter registers are set to the addresses 
of the first and second instructions, respectively, to be executed in the target 
routine. 

5. Other registers are set as required. These may include registers such as the ALU 
Status, Q, and so forth, depending on the particular situation. Some of these 
registers are unaffected by the FZ bit, so they must be set in such a manner that 
they are not modified unintentionally before the interrupt return. 



Figure 3-51 Current Processor Status Before Interrupt Return 
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Once the processor registers are configured properly, as described above, an inter- 
rupt return instruction (IRET or IRETINV) performs the remaining steps necessary to 
return to the target routine. The following operations are performed by the interrupt 
return instruction: 

1. Any in-progress load or store operation is completed. If a load-multiple or 
store-multiple sequence is in progress, the interrupt return is not executed until the 
sequence completes. 

2. Interrupts and traps are disabled, regardless of the settings of the DA, Dl, and IM 
fields of the Current Processor Status, for steps 3 through 10. 

3. If the interrupt return instruction is an IRETINV, ail Valid bits in the Branch Target 
Cache memory are reset. 

4. The contents of the Old Processor Status Register are copied into the Current 
Processor Status Register. This normally resets the FZ bit allowing the Program 
Counter 0,1,2, Channel Address, Data. Control, and ALU Status registers to 
update normally. Since certain bits of the Current Processor Status Register 
always are updated by the processor, this copy operation may be irrelevant for 
certain bits {e.g., the Interrupt Pending bit). 

5. If the Contents Valid (CV) bit of the Channel Control Register is 1 , and the Not 
Needed (NN) and Multiple Operation (ML) bits are both 0, an external access is 
started. This operation is based on the contents of the Channel Address, Channel 
Data, and Channel Control registers. The Current Processor Status Register 
conditions the access— as is normally the case. Note that load-multiple and 
store-multiple operations are not restarted at this point. 

6. The address in Program Counter 1 is used to fetch an instruction. The Current 
Processor Status Register conditions the fetch. This step is treated as a branch in 
the sense that the processor searches the Branch Target Cache memory for the 
target of the fetch. 

7. The instruction fetched in step 6 enters the decode stage of the pipeline. 

8. The address in Program Counter is used to fetch an instruction. The Current 
Processor Status Register conditions the fetch. This step is treated as a branch in 
the sense that the processor searches the Branch Target Cache memory for the 
target of the fetch. 

9. The instruction fetched in step 6 enters the execute stage of the pipeline, and the 
instruction fetched in step 8 enters the decode stage. 

1 0. If the CV bit in the Channel Control Register is a 1 , the NN bit is 0, and the ML bit 
is 1 , a load-multiple or store-multiple sequence is started, based on the contents 
of the Channel Address, Channel Data, and Channel Control registers. 

1 1 . Interrupts and traps are enabled per the appropriate bits in the Current Processor 
Status Register. 

12. The processor resumes normal operation. 

3.5.5.3 FAST INTERRUPT PROCESSING 

The registers affected by the FZ bit of the Current Processor Status Register are 
those which are modified by almost any usual sequence of instructions. Since the FZ 
bit is set by an interrupt or trap, the interrupt or trap handler is able to execute while 
not disturbing the state of the interrupted routine, though its execution is somewhat 
restricted. Thus, it is not necessary in many cases for the interrupt or trap handler to 
save the registers that are affected by the FZ bit. 



The processor provides an additional benefit if tiie Program Counter and Program 
Counter 1 Registers are not modified by the interrupt or trap handler. If Program 
Counters and 1 contain the addresses of sequential Instructions when an interrupt 
or trap is taken, and if they are not modified before an interrupt return is executed, 
step 8 of the interrupt return sequence above occurs as a sequential fetch— instead of 
a branch— for the interrupt return. The performance impact of a sequential fetch is 
normally less than that of a non-sequential fetch. 

Because the registers affected by the FZ bit are sometimes required for Instruction 
execution, it is not possible for the interrupt or trap handler to execute all instructions, 
unless the required registers are first saved elsewhere (e.g., in one or more global 
registers). Most of the restrictions due to register dependencies are obvious (e.g., the 
Byte Pointer for byte extracts), and will not be discussed here. Other less obvious 
restrictions are listed below: 

1. Load Multiple and Store Multiple. The Channel Address, Channel Data, and 
Channel Control registers are used to sequence load-multiple and store-multiple 
operations, so these instructions cannot be executed while the registers are 
frozen. However, note that other external accesses may occur; the Channel 
Address, Channel Data, and Channel Control registers are required only to restart 
an access after an exception, and the interrupt or trap handler is not expected to 
encounter any exceptions. 

2. Loads and stores which set the Byte Pointer. If the Set Byte Pointer (SB) of a load 
or store instruction is 1 , and the FZ bit is also 1 , there is no effect on the Byte 
Pointer. Thus, the execution of external byte and half-word accesses using this 
mechanism is not possible. 

3. Extended arithmetic. The Carry bit of the ALU Status Register is not updated while 
theFZbitisL 

4. Divide step instructions. The Divide Flag of the ALU Status Register is not 
updated when the FZ bit Is 1 . 

If the interrupt or trap handler does not save the state of the interrupted routine, it 
cannot allow additional interrupts and traps. Also, the operation of the interrupt or trap 
handler cannot depend on any trapping instructions (e.g., DIVIDE and DIVIDU Instruc- 
tions, illegal operation codes, arithmetic overflow, etc.), since these cause a Monitor 
trap (see Section 3.5.7). There are certain cases, however, where traps are unavoid- 
able; these are discussed in Section 3.5.10. 



3.5.6 WARN Trap 



The processor recognizes a s pecial tr ap, caused by the activation of the WARN input, 
which cannot be masked. The WARN trap is intended to be used for severe system- 
error or deadlock conditions. It allows the processor to be placed in a known, oper- 
able state, while preserving much of its original state for error reporting and possible 
recovery. Therefore, it shares some features in common with the Reset mode as well 
as features common to other traps described in this section. 



The major differences between the WARN trap and other traps are: 

1 . The processor does not wait for an In-progress external access to complete 
before taking the trap, since this access might not complete. However, the 
information related to any outstanding access is retained by the Channel Address, 
Channel Data, and Channel Control registers when the trap is taken. 
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The vector-fetch operation Is not perform ed, regardless of the VF bit of the 
Configuration Register, when the WARN trap is taken. Instead, the ROM Enable 
(RE) bit in the Current Processor Status is set, and instruction fetching begins 
immediately at address 16 in the instruction ROM. The trap handler executes 
directly from the instruction ROM without the need to access external (and 
possibly non-functional or invalid) instruction/data memory. 



Note that WARN trap may disrupt the state of the routine that is executing when it Is 
taken, prohibiting this routine from being restarted. 

3.5.7 Monitor Trap 

The processor takes a special trap, called the Monitor trap, to enter the Monitor mode. 
A Monitor trap is taken when the DA bit of th e Curr ent Processor Status is 1 and a 
tra p occu rs, except for a trap caused by the TRAP(1-0) inputs. Interrupts caused by 
the INTR(3-0) inputs and the Timer facility cannot cause a Monitor trap. 

The major difference between a Monitor trap and other traps is that the processor 
does not perform the vector-fetch operation. Instead, the processor imme diately 
begins fetching instructions at location 16 in t he instr uction ROM, as for a WARN trap. 
The Monitor trap can be distinguished from a WARN trap because the Monitor Mode 
(MM) bit in the Current Processor Status is 1 . The processor also behaves as if the 
Freeze (FZ), ROM Enable, Physical Addressing/Data, Physical Addressing/Instruc- 
tion, and Supervisor Mode bits of the Current Processor Status Register were 1. 
However, the Current Processor Status Register is not affected. 

When the Monitor trap is taken, the Shadow Program Counters 0, 1, and 2 contain 
instruction addresses for the suspended program. The values in the shadow program 
counters are held while the processor is in the Monitor mode, unless they are explic- 
itly modified by a move-to-special-register instruction. This allows the suspended 
program to be restarted even if the FZ bit was 1 when the trap was taken; if the FZ bit 
was 1 , the Program Counter 0, 1 , and 2 registers do not contain the appropriate ad- 
dresses. 

Also, when the Monitor trap is taken, the Reason Vector Register is set to indicate the 
cause of the trap. The Reason Vector Register is set with the vector number of the 
trap condition that caused the Monitor trap. If the Monitor trap is caused by a WARN 
trap, the value 1 6 (de cimal) is placed into the Reason Vector Register. This is the 
vector number for the INTRO interrupt; since interrupts cannot cause a Monitor trap, 
there is no conflict. 

In the Monitor mode, the processor ignores interrupts and traps, except for the follow- 
ing traps: Data Access Exception, Coprocessor Exception, Instruction Access Excep- 
tion, Instruction TLB Miss, Instruction MMU Protection Violation. An occurrence of one 
of these traps will cause another Monitor trap; however, the shadow program counters 
and Reason Vector Registers will not be set. 

An IRET or IRETINV instruction, executed in the Monitor mode, causes a return from 
Monitor mode. The process performs all actions that normally apply for an interrupt 
return, except that it simply clears the MM bit in the Current Processor Status Register 
rather than loading this register from the Old Processor Status Register, and it 
resumes execution using the addresses in the shadow program counters rather than 
the program counters. 
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3.5.8 Sequencing of Interrupts and Traps 

On every cycle, the processor decides either to execute instructions or to take an 
interrupt or trap. Since there are multiple sources of interrupts and traps, more than 
one interrupt or trap may be pending on a given cycle. 

To resolve conflicts, interrupts and traps are taken according to the priority shown in 
Table 3-1 1 . In this table, interrupts and traps are listed in order of decreasing priority. 
This section discusses the first three columns of Table 3-1 1 . The last two columns are 
discussed in Section 3.5.9. 

In Table 3-1 1 , interrupts and traps fall into one of two categories depending on the 
timing of their occurrence relative to instruction execution. These categories are 
indicated in the third column of Table 3-1 1 by the labels Inst and Async. These labels 
have the following meaning: 

1 . Inst— Generated by the execution or attempted execution of an instruction. 

2. Async — Generated asynchronous to and independent of the instruction being 
executed, although it may be a result of an instruction executed previously. 

The principle for interrupt and trap sequencing is that the highest priority interrupt or 
trap is taken first. Other interrupts and traps remain active until they can be taken, or 
are regenerated when they can be taken. This is accomplished, depending on the 
type of interrupt or trap, as follows: 

1 . All traps in Table 3-1 1 with priority 13 through 15 are regenerated by the 
re-executlon of the causing instruction. 

2. Most of the Interrupts and traps of priority 4 through 12 must be held by external 
hardware until they are taken. The exceptions to this are listed In 3) below. 

3. The exceptions to 2 above are the Data Access Exception trap, the Coprocessor 
Exception trap, the Timer interrupt, and the Trace trap. These are caused by bits 
in various registers in the processor and are held by these registers until taken or 
cleared. The relevant bits are: the Transaction Faulted (TF) bit of the Channel 
Control Register for Data Access Exception and Coprocessor Exception traps, the 
Interrupt (IN) bit of the Timer Reload Register for Timer interrupts, and the Trace 
Pending (TP) bit of the Current Processor Status Register for Trace traps. 

4. All traps of priority 2 and 3 in Table 3-1 1 , except for the Unaligned Access trap, 
are not regenerated. These traps are mutually exclusive, and are given high 
priority because they cannot be regenerated; they must be take n if they occur. If 
one of these traps occurs at the same time as a reset or WARN trap, it is not 
taken, and Its occurrence Is lost. 

5. The Unaligned Access trap Is regenerated internally when an external access is 
restarted by the Channel Address, Channel Data, and Channel Control registers. 
Note that this trap is not necessarily exclusive to the traps discussed in 4) above. 

Note that the Channel Address, Channel Data, and Channel Control registers are set 
for a WARN trap only If an external access is in progress when the trap Is taken. 

3.5.9 Exception Reporting and Restarting 

When an instruction encounters an exceptional condition, the Program Counter 0, 
Program Counter 1 , and Program Counter 2 registers report the relevant instruction 
address(es), and allow the Instruction sequence to be restarted once the exceptional 
condition has been remedied (if possible). Similarly, when an external access or 
coprocessor transfer encounters an exceptional condition, the Channel Address, 
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Table 3-11 



Interrupt and Trap Priority Table 



Priority Type of Interrupt or Trap 



Inst/Async PC1 Channel Regs 



1 
(Highest) 



WARN 



User-Mode Data TLB Miss 
Supervisor-Mode Data TLB Miss 
Data MMU Protection Violation 



3 


Unaligned Access 
Coprocessor Not Present 
Out of Range 
Floating-Point Exceptions 
Assert Instructions 




Instruction Emulation 




DIVIDE 




DIVIDU 


4 


Data Access Exception 
Coprocessor Exception 


5 


Trapo 


6 


TRAP1 


7 


INTRO 


8 


INTR1 


9 


INTte 


10 


INTR3 


11 


Timer 


12 


Tranfi ^naiisftd hv TF TP 



13 



User-Mode Instruction TLB Miss 
Supervisor-Mode Instr. TLB Miss 
Instruction MMU Protection Violation 
Instruction Access Exception 



Async 


Next 


See Note 1 


Inst 
Inst 
Inst 


Next 
Next 
Next 


All 
All 
All 


Inst 
Inst 
Inst 
Inst 
Inst 
Inst 
Inst 
Inst 


Next 
Next 
Next 
Next 
Next 
Next 
Next 
Next 


All 

All 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 


Async 
Async 


Next 
Next 


All 
All 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Async 


Next 


Multiple 


Inst 
Inst 
Inst 
Inst 


Curr 
Curr 
Curr 
Curr 


N/A 
N/A 
N/A 
N/A 



-j 4 Trace (caused by breakpoint 

comparison) 

15 Illegal Opcode 

(Lowest) Protection Violation 



Inst 



Inst 
Inst 



Curr 



Curr 
Curr 



N/A 



N/A 
N/A 



Note 1 : The Channel Address, Channel Data, and Channel Control registers are set for a WARN 
trap only if an external access is in progress when the trap is taken. 
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Channel Data, and Channel Control registers report information on the access or 
transfer, and allow it to be restarted. This section describes the interpretation and use 
of these registers. 

The PC1 column In Table 3-1 1 describes the value held in the Program Counter 1 
Register (PC1) when the interrupt or trap is taken. For traps in the /nsf category, PC1 
contains either the address of the instruction causing the trap, indicated by Curr, or 
the address of the instruction following the instruction causing the trap, indicated 
by Next. 

For interrupts and traps in the Async category, PC1 contains the address of the first 
instruction which was not executed due to the taking of the Interrupt or trap. This is 
the next instruction to be executed upon interrupt return, as indicated by Next\n the 
PC1 column. 

3.5.9.1 INSTRUCTION EXCEPTIONS 

For traps caused by the execution of an instruction (e.g., the Out of Range trap), the 
Program Counter 2 Register contains the address of the instruction causing the trap. 
In all of these cases, PC1 is in the A/exf category. The Exception Opcode Register 
contains the operation code of the instruction causing the trap. 

The traps associated with instruction fetches (i.e., those of priority 13) occur only if the 
processor attempts the execution of the associated instruction. An exception may be 
detected during an instruction prefetch, but the associated trap does not occur if a 
non-sequential fetch occurs before the processor attempts the execution of the invalid 
instruction. This prevents the spurious indication of instruction exceptions. 

In the case of a Monitor trap, the relevant instruction addresses are contained in the 
Shadow Program Counter 0, 1, and 2 registers rather than the Program Counter 0, 1, 
and 2 registers. 

3.5.9.2 DATA EXCEPTIONS 

The Channel Regs column of Table 3-1 1 indicates the cases for which the Channel 
Address, Channel Data, and Channel Control registers contain information related to 
an external access or coprocessor transfer (these registers collectively are termed 
"channel registers" in the following discussion). For the cases indicated, the access or 
transfer did not complete because of some exceptional condition. Note that the Chan- 
nel Data Register contains relevant information only in the case of a store. 



For the WARN trap, the channel registers are vali d only if a load or store were in 
progress when the trap was taken. Recall that the WARN trap does not wait for any 
in-progress access to complete. 

For the traps with an All in the Channel Regs column of Table 3-1 1 , the channel 
registers contain information relevant to the trap in all cases. These traps are associ- 
ated with exceptional events during external accesses or coprocessor transfers. 

For the traps with a Multiple in the Channel Regs column, the channel registers might 
contain information for restarting an interrupted load-multiple or store-multiple opera- 
tion. In these cases, the operation did not encounter an exception, but was simply 
canceled for latency considerations. 

The Information contained in the channel registers allows the processor to restart the 
related operation during an interrupt return sequence, without any special assistance 
by software. Software must only insure that the relevant information is retained in, or 
restored to, the channel registers before an interrupt return is executed. 
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3.5.10 Arithmetic Exceptions 

Integer and floating-point instructions can cause Out of Range or Floating-Point Ex- 
ception traps, respectively, if an exception is detected during the arithmetic operation. 
This section describes the conditions under which these traps occur and the addi- 
tional operations performed beyond those described in Section 3.5.5. 

3.5.10.1 INTEGER EXCEPTIONS 

Some integer add and subtract instructions— ADDS, ADDU, ADDCS, ADDCU, SUBS, 
SUBU, SUBCS, SUBCU, SUBRS, SUBRU, SUBRCS, and SUBRCU— cause an Out 
of Range trap upon overflow or underflow of a 32-bit signed or unsigned result, de- 
pending on the instruction. 

Two integer multiply instructions— MULTIPLY and MULTIPLU— cause an Out of 
Range trap upon overflow of a 32-bit signed or unsigned result, respectively, If the 
MO bit of the Integer Environment Register is 0. If the MO bit is 1 , these multiply 
Instructions cannot cause an Out of Range trap. 

Two integer divide instructions— DIVIDE and DIVIDU— take the Out of Range trap 
upon overflow of a 32-bit signed or unsigned result, respectively, if the DO bit of the 
Integer Environment Register is 0. If the DO bit is 1 , the divide instructions cannot 
cause an Out of Range trap unless the divisor is zero. If the divisor is zero, an Out of 
Range trap always occurs, regardless of the DO bit. 

In addition to the operations described Section 3.5.5, the following operations are 
performed when an Out of Range trap is taken: 

1 . The operation code of the instruction causing the exception is placed in the lOP 
field of the Exception Opcode Register. 

2. For the MULTIPLY, MULTIPLU, DIVIDE, and DIVIDU instructions, the absolute 
register numbers of the excepting instruction's source and destination registers 
are placed into the Indirect Pointer A, Indirect Pointer B, and Indirect Pointer C 
registers. 

3. For the MULTIPLY, MULTIPLU, DIVIDE, and DIVIDU instructions, the destination 
register or registers are unchanged. 

3.5.10.2 FLOATING-POINT EXCEPTIONS 

A Floating-Point Exception trap occurs when an exception is detected during a float- 
ing-point operation, and the exception is not masked by the corresponding bit of the 
Floating-Point Mask Register. In this context, a floating-point operation is defined 
as any operation that accepts a floating-point number as a source operand, that 
produces a floating-point result, or both. Thus, for example, the CONVERT instruc- 
tion may create an exception while attempting to convert a floating-point value to an 
integer value or vice versa. The occurrence of floating-point exceptions is discussed 
in detail in Appendix C. 

In addition to the operations described in Section 3.5.5, the following operations are 
performed when a Floating-Point Exception trap is taken: 

1 . The operation code of the instruction causing the exception is placed in the lOP 
field of the Exception Opcode Register. 

2. The status of the trapping operation Is written Into the trap status bits of the 
Floating-Point Status Register. The written status bits do not depend on the 
values of the corresponding mask bits in the Floating-Point Environment Register. 

3. The absolute-register numbers of the excepting instruction's source and dest- 
ination registers are placed into the Indirect Pointer A, Indirect Pointer B, and 
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Indirect Point C registers. If the RB or RC field specifies a function code, that code 
is transferred to the corresponding indirect pointer. Note that if the most-significant 
bit of this function code is one, the value of the Stack Pointer has been added to 
the RB field, and must be subtracted to recover the original field. 

4. The destination register or registers are left unchanged. 

3.5.1 1 Exceptions During Interrupt and Trap Handling 

In most cases, interrupt and trap handling routines are executed with the DA bit In the 
Current Processor Status having a value of 1 . It is normally assumed that these rou- 
tines do not create many of the exceptions possible in most other processor routines, 
or that whatever exceptions do occur can be handled in the Monitor mode. 

If these assumptions are not valid for a particular interrupt or trap handler, it is impor- 
tant that the handler save the state of the processor and reset the FZ bit of the Cur- 
rent Processor Status, so that the handler itself may be restarted properly. This must 
be accomplished before any interrupts or traps can be taken. In this case, the state 
(or the state of some other process) must be restored before an interrupt return is 
executed. 

If the processor does take a trap while handling another interrupt or trap, it enters the 
Monitor mode, and the state of the interrupt or trap handler is reflected in the Shadow 
Program Counter 0, 1 , and 2 registers and the Reason Vector Register. Other proces- 
sor state is preserved, including the Current Processor Status Register. This allows 
the Monitor trap routine to handle the trap. 

3.6 MEMORY MANAGEMENT 

The Am29050 microprocessor incorporates a Memory Management Unit (MMU) for 
performing virtual-to^physical address translation and memory access protection. This 
section describes the logical operation of the Memory Management Unit. Related 
issues are discussed in Sections 7.3.3 and 7.3.4. 

Address translation is performed either by one of the two Region Mapping Units 
(RMUs), or by the Translation Look-Aside Buffer (TLB). The RMUs map virtual re- 
gions of variable size, ranging from 64 kb to 2 Gb, into regions of physical memory. 
Each RMU consists of two protected special-purpose registers, which are described in 
Section 3.2.3. Any virtual address not mapped by the RMUs is translated by the TLB. 
The TLB maps virtual regions of fixed size, called pages, into physical regions of the 
same size, called page frames. The structure of the TLB is described below. 

Address translation can be performed only for instruction/data memory accesses. No 
address translation is performed for instruction ROM, input/output, coprocessor or 
interrupt/trap vector accesses. However, an instruction/data memory access can be 
re-directed to input/output by the address-translation process. 

3.6.1 Translation Look-Aside Buffer 

The MMU stores the most-recently performed address translations in a special cache, 
the Translation Look-Aside Buffer (TLB). The TLB reflects information in the proces- 
sor system page tables, except that it specifies the translation for many feWer pages; 
this restriction allows the TLB to be incorporated on the processor chip where the 
performance of address translation is maximized. 

A diagram of the TLB is shown in Figure 3-52. The TLB is a table of 64 entries, 
divided Into two equal sets, called Set and Set 1 . Within each set, entries are 

3-70 PROGRAMMER REFERENCE 



numbered to 31 . Entries in different sets wliich have equivalent entry-numbers are 
grouped into a unit called a line; there are thus 32 lines in the TLB, numbered to 31 . 



Figure 3-52 Translation Look-Aside Buffer Organization 
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Each TLB entry is 64 bits long, and contains mapping and protection information for a 
single virtual page. TLB entries may be inspected and modified by processor instruc- 
tions executed in the Supervisor mode. The layout of TLB entries is described in 
Section 3.2.4. 

The TLB stores information about the ownership of the TLB entries in an 8-bit Task 
Identifier (TID) field in each entry. This makes it possible for the TLB to be shared by 
several independent processes without the need for invalidation of the entire TLB as 
processes are activated. It also increases system performance by permitting proc- 
esses to warm-start (i.e., to start execution on the processor with a certain number of 
TLB entries remaining in the TLB from a previous execution). 

Each TLB entry contains a Usage bit to assist management of the TLB entries. The 
Usage bit indicates which set of the entry within a given line was least recently used 
to perform an address translation. Usage bits for two entries in the same line are 
equivalent. 

The TLB contains other fields which are described in the following sections. 
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3.6.2 Address Translation 

For the purpose of address translation, the virtual instruction/data address-space of a 
process is typically partitioned into regions of fixed size, called pages. Pages are 
mapped into equivalent-sized regions of physical memory, called page frames. All 
accesses to instructions or data contained within a given page use the same virtual- 
to-physical address translation. 

In addition to the page-by-page translation provided by the TLB, the Am29050 micro- 
processor supports translation for variable-sized regions, ranging from 64 kb to 2 Gb, 
by means of two Region Mapping Units. Each RMU consists of two special-purpose 
registers. In each RMU, a Region Mapping Address Register contains the base ad- 
dress of the virtual region to be mapped and the base address of the corresponding 
physical region. A Region Mapping Control Register specifies the region size and 
contains information which is used to control access, including a Task Identifier. The 
RMUs have priority over the TLB translation; in addition, RMUO has priority over 
RMU1. 

3.6.2.1 ADDRESS TRANSLATION CONTROLS 

The processor attempts to perform address translation for the following external 
accesses: 

1. Instruction accesses, if the Physical Addressing/Instructions (PI) and ROM Enable 
(RE) bits of the Current Processor Status are both 0. 

2. User-mode accesses to instruction/data memory if the Physical Addressing/Data 
(PD) bit of the Current Processor Status is 0. 

3. Supervisor-mode accesses to instruction/data memory if the Physical Address 
(PA) bit of the load or store instruction performing the access is 0, and the PD bit 
of the Current Processor Status is 0. 

Address translation is controlled by the MMU Configuration Register. This register 
specifies the virtual page size, and contains an 8-bit Process Identifier (PID) field. The 
PID field specifies the process number associated with the currently running program, 
if this is a User-mode program. Supervisor-mode programs are assigned a fixed 
process number of zero. The process number is compared with Task Identifier (TID) 
field of the Region Mapping Control Register or the TLB entry, as appropriate, during 
address translation. The TID field must match the process number for the translation 
to be valid. 

3.6.2.2 RMU ADDRESS TRANSLATION PROCESS 

In a successful RMU address translation, the most-significant bits of the virtual ad- 
dress match the corresponding bits of the Virtual Base Address (VBA) field of the 
Region Mapping Address Register, and are replaced with the contents of the Physical 
Base Address (PBA) field. The number of bits compared and subject to replacement 
is determined by the Region Size (RGS) field of the Region Mapping Control Register. 
For example, if the region size is 64 kb, 16 bits are compared; If the region size is 
128 kb, 15 bits are compared, and so on. 

For an address translation to be valid, the following conditions must be met: 

1 . The most-significant bits of the virtual address, determined by the RGS field, 
match the corresponding bits of the VBA field of the Region Mapping Address 
Register. 

2. For a User-mode access, the TID field in the Region Mapping Control Register 
matches the PID field in the MMU Configuration Register. For a Supervisor-mode 
access, the TID field is zero. 
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3.6.2.3 



3. The VE bit of the Region Mapping Control Register is 1 . 

The address space of the physical address is determined by the Input/Output (10) bit 
of the Region Mapping Control Register. If the 10 bit is 0, the address is in the instruc- 
tion/data memory address space. If the 10 bit is 1 , the address is in the input/output 
address space. 

If the address translation is valid, then certain bits of the Region Mapping Control 
register are used to perform protection checking (see Section 3.6.5). If there is no 
protection violation, the translation is performed and the resulting physical address is 
placed on the processor's Address Bus. If there is a protection violation, a Data or 
Instruction MMU Protection Violation trap occurs, depending on the access. 

If address translation is valid, and there is no protection violation, the PGM bits from 
the Region Mapping Control register are placed on the MPGM(1-0) outputs during 
the address cycle for the access. 

If the address translation is not valid in RMUO, it is attempted by RMU1 . If the transla- 
tion is not valid in RMU1 , it is attempted by the TLB. 

TLB ADDRESS TRANSLATION PROCESS 

Virtual addresses are partitioned into three fields for TLB address translation, as 
shown in Figure 3-53. The partitioning of the virtual address is based on the page 
size. Pages may be of size 1 , 2, 4, or 8 kb, as specified by the MMU Configuration 
Register. 



Figure 3-53 Virtual Address for 1, 2, 4, and 8 kb Pages 
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The TLB address-translation process Is diagrammed In Figure 3-54. Address transla- 
tion is performed by the following fields in the TLB entry: the Virtual Tag (VTAG), the 
Task Identifier (TID), the Valid Entry (VE) bit, the Real Page Number (RPN) field, and 
the Input/Output (10) bit. To perform an address translation, the processor accesses 
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the TLB line whose number is given by certain bits in the virtual address. The bits 
used depend on the page size as follows: 



Page Size 



Virtual Address Bits (for Line Access) 



1 kb 
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4kb 
8kb 



14-10 
15-11 
16-12 
17-13 



The accessed line contains two TLB entries, which in turn contain two VTAG fields. 
The VTAG fields are both compared to bits in the virtual address. This comparison 
depends on the page size as follows (note that VTAG bit-numbers are relative to the 
VTAG field, not the TLB entry): 
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Certain bits of the VTAG field do not participate in the comparison for page sizes 
larger than 1 kb. These bits of the VTAG field are required to be zero. 



Figure 3-54 TLB Address Translation Process 
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For an address translation to be valid, the following conditions must be met: 

1 . The virtual address bits match corresponding bits of the VTAG field as specified 
above. 

2. For a User-mode access, the TID field in the TLB entry matches the PID field in 
the MMU Configuration Register. For a Supervisor-mode access, the TID field is 
zero. 

3. The VE bit in the TLB entry is 1 . 

4. Only one entry in the line meets conditions 1 , 2, and 3 above. If this condition is 
not met, the results of the translation may be treated as valid by the processor, but 
the results are unpredictable. 

If the address translation is valid for one TLB entry in the selected line, the RPN field 
in this entry is used to form the physical address of the access. The RPN field gives 
the portion of the physical address that depends on the translation; the remaining 
portion of the virtual address— called the Page Offset— is invariant with address 
translation. 

The Page Offset comprises the low-order bits of the virtual address, and gives the 
location of a byte (because of byte addressing) within the virtual page. This byte is 
located at the same position in the physical page frame, so the Page Offset also 
comprises the low-order bits of the physical address. 

The 32-bit physical address is the concatenation of certain bits of the RPN field and 
Page Offset, where the bits from each depend on the page size as follows (note that 
RPN bit numbers are relative to the RPN field, not the TLB entry): 

Page Size RPN Bits Virtual Address Bits for Page Offset 

1 kb 21-0 9-0 

2kb '^1-1 10-0 

4kb 21-2 11-0 

8kb 21-3 12-0 

Note: Certain bits of the RPN field are not used in forming the physical address for 
page sizes greater than 1 kb. These bits of the RPN are required to be zero. In 
addition, for certain instruction accesses, the Page Offset is incremented by 8 
or 16 as described in Section 4.2.3.1. 

The address space of the physical address is determined by the Input/Output (10) bit 
of the TLB entry. If the 10 bit Is 0, the address is in the instruction/data memory ad- 
dress space. If the 10 bit is 1 , the address is in the input/output address space. 

If an address translation is successful, the TLB entry is further used to perform protec- 
tion checking for the access. Bits in the TLB make it possible to restrict accesses — in- 
dependently for Supervisor-mode and User-mode accesses— to any combination of 
load, store, and instruction accesses, or to no access. Section 3.6.5 describes protec- 
tion in more detail. 

If the address translation is valid, and no protection violation is detected, the physical 
address from the translation is placed on the processor's Address Bus, and the ac- 
cess is initiated. If the translation is not valid, or a protection violation is detected, a 
trap occurs. Depending on the state of the ch annel interface, the access request may 
be plabed on the Address Bus with the signal BINV asserted, even though the trap 
occurs. 
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Also, If the address translation Is successful, and there Is no protection violation, the 
PGM bits from the TLB entry used for translation are placed on the MPGM(1-0) out- 
puts during the address cycle for the access. If address translation Is not performed, 
these pins are both Low for the address cycle. 

If the TLB cannot translate an address, a TLB miss occurs. The MMU causes a trap If 
either a TLB miss occurs, or the translation Is successful and a protection violation Is 
detected. The processor distinguishes between traps caused by Instruction and data 
accesses, and between traps caused by User-and Supervisor-mode accesses, as 
follows: 

Trap Vector Number Type of Trap 

8 User-Mode Instruction TLB Miss 

9 User-Mode Data TLB Miss 

1 Supervisor-Mode Instruction TLB Miss 

1 1 Supervisor-Mode Data TLB Miss 

1 2 Instruction TLB Protection Violation 

1 3 Data TLB Protection Violation 

The distinction between the above traps is made to assist trap handling, particularly 
the routines that load TLB entries. 



3.6.3 TLB Reload 



So that the MMU may support a large variety of memory-management architectures. It 
does not directly load TLB entries that are required for address translation. It simply 
causes a TLB miss trap when address translation is unsuccessful. The trap causes a 
program— called the TLB reload routine— to execute. The TLB reload routine is de- 
fined according to the structure and access method of the page table contained In an 
external device or memory. 

When a TLB miss trap occurs, the LRU Recommendation Register is written with the 
TLB register number for Word of the TLB entry to be used by the TLB reload rou- 
tine. For instruction accesses, the Program Counter 1 Register contains the instruc- 
tion address that was not successfully translated. For data accesses, the Channel 
Address Register contains the data address that was not successfully translated. 

The TLB reload routine determines the translation for the address given by the Pro- 
gram Counter 1 Register or Channel Address Register, as appropriate. The TLB 
reload routine uses an external page table to determine the required translation, and 
loads the TLB entry indicated by the LRU Recommendation Register so that the entry 
may perform this translation. In a demand-paged environment, the TLB reload routine 
may additionally Invoke a page-fault handler when the translation cannot be per- 
formed. 

TLB entries are written by the Move To TLB (MTTLB) Instruction, which copies the 
contents of a general-purpose register Into a TLB register. The TLB register number Is 
specified by bits 6-0 of a general purpose register. TLB entries are read by the Move 
From TLB (MFTLB) Instruction, which copies the contents of a TLB register Into a 
general-purpose register. Again, the TLB register number Is specified by a general 
purpose register. 

3.6.4 TLB Entry Invalidation 

There are two methods for invalidating TLB entries that are no longer required at a 
given point in program execution. The first Involves resetting the Valid Entry bit of a 
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single entry (this is done by a Move To TLB instruction). The second involves chang- 
ing the value of the Process Identifier (PID) field of the MMU Configuration Register; 
this invalidates ail entries whose Task Identifier (TID) fields do not match the new 
value. 

If an entry is invalidated by changing the PID field, the TLB entry still remains valid in 
some sense. If the PID field is changed again to match the TID field, the entry may 
once again participate in address translation. This ability can be used to reduce the 
number of TLB misses in a system during process switching. However, it is important 
to manage TLB entries so that an invalid match cannot occur between the PID field 
and the TID field of an old TLB entry. 



3.6.5 Protection 



If an address translation is performed successfully as described In Section 3.6.2, the 
Region Mapping Control Register or TLB entry used In address translation Is used to 
perform protection checking for the access. Six bits are used for this purpose; their 
names and functions are the same in the Region Mapping Control Registers and the 
TLB entries: Supervisor Read (SR), Supervisor Write (SW), Supervisor Execute (SE), 
User Read (UR), User Write (UW), and User Execute (UE). These bits restrict ac- 
cesses, depending on the program mode of the access, as shown in Table 3-12 (the 
value X Is a don't care). 
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Note that for the Load and Set (LOADSET) instruction, the protection bits must be set 
to allow both the load and store access. If this condition does not hold, neither access 
is performed. 

If protection checking indicates that a given access is not allowed, a Data MMU Pro- 
tection Violation or Instruction MMU Protection Violation trap occurs. The cause of the 
trap can be determined by Inspecting the Program Counter 1 Register for an Instruc- 
tion MMU Protection Violation, or by Inspecting the contents of the Channel Address 
and Channel Control registers for a Data MMU Protection Violation. 
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3.7 DEBUGGING 



Software debugging is supported by the Trace Facility, hardware breakpoints, and the 
Monitor mode. The Trace Facility guarantees exactly one trap after the execution of 
any instruction in a program being tested. The Trace trap allows a debug routine to 
follow the execution of Instructions, and to determine the state of the processor and 
system at the end of each instruction. Hardware breakpoints return control to a 
debugger at specified program addresses. The Monitor mode allows the debugging of 
operating-system routines and interrupt and trap handlers. 



3.7.1 Trace Facility 

Tracing is controlled by the Trace Enable (TE) and Trace Pending (TP) bits of the 
Current Processor Status Register. The value of the TE bit always is copied into the 
TP bit when an instruction enters the write-back stage. A Trace trap occurs whenever 
the TP bit is 1 . As with most traps, the Trace trap can be disabled only by the DA bit 
of the Current Processor Status Register. 

In order to trace the execution of a program, the debug routine performs an interrupt 
return to cause the program to begin or resume execution. However, before the Inter- 
rupt return is executed, the TE and TP bits of the Old Processor Status are set with 
the values 1 and 0, respectively. The interrupt return causes these bits to be copied 
into the TE and TP bits of the Current Processor Status. 

When the target of the interrupt return (whose address is contained in the Program 
Counter 1 Register when the interrupt return is executed) enters the write-back stage, 
the processor copies the value of the TE bit into the TP bit. Since the TP bit is a 1 , a 
Trace trap occurs. This trap prevents any further instruction execution in the target 
routine until the interrupt is taken and the routine is resumed with an interrupt return. 
When the Trace trap is taken, the TE and TP bits are both reset automatically, pre- 
venting any further Trace traps. 

Since the Trace Facility is managed by the Old and Current Processor Status regis- 
ters, it operates properly In the event that the processor takes an interrupt or trap- 
that is unrelated to the Trace Facility — before the above trace sequence completes. 
When the unrelated interrupt or trap is taken, the state of the Trace Facility (i.e., the 
values of the TE and TP bits) is copied Into the Old Processor Status from the Current 
Processor Status. The Trace Facility then resumes operation when the interrupted 
routine is restarted by an interrupt return. 

Note that it is possible to cause a Trace trap by directly setting the TP and/or TE bits 
in the Current Processor Status Register. This may be accomplished only by a Super- 
visor-mode program. 

3.7.2 Instruction Breakpoints 

The Am29050 microprocessor provides two hardware breakpoints for causing Trace 
traps at specified instruction addresses. These hardware breakpoints are specified by 
the following registers: Instruction Breakpoint Address 0, Instruction Breakpoint Con- 
trol 0, Instruction Breakpoint Address 1, and Instruction Breakpoint Control 1. The two 
hardware breakpoints are identical in definition and capability. 

Breakpoint comparisons are performed by both hardware breakpoints on instructions 
as the instructions enter the execute stage of the processor pipeline. If one (or both) 
of the breakpoint comparisons is valid, a Trace trap occurs and the instruction is not 
completed. The Trace trap caused by a hardware breakpoint has lower priority than a 
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Trace trap caused by the Trace facility. Also, If the DA bit In the Current Processor 
Status Register is 1 when the Trace trap occurs, a Monitor trap is tal<en. 

A breakpoint comparison is valid when the instruction address matches the address in 
the Instruction Breakpoint Address Register or Instruction Breakpoint Address 1 
Register, and the following conditions are met by the corresponding Instruction Break- 
point Control register. 

1. The Breakpoint Has Occurred (BHO) bit is 0. The BHO bit allows the processor to 
progress beyond the breakpoint once It has been encountered. 

2. The Breakpoint Enable (BEN) bit is 1. 

3. The Break or Synchronize (BSY) bit is 1 . If the BSY bit Is and all other 
conditions are valid, a synchronization pulse Is generated externally by placing the 
value 010 on the STAT(2-0) outputs for one cycle (see Section 5.3). This permits 
the hardware breakpoint to generate a trigger for external logic, without causing a 
Trace trap that disturbs system timing. 

4. The value of the Break ROM (BRM) bit Is equal to the value of the ROM Enable bit 
in the Current Processor Status Register. This differentiates between a breakpoint 
In the instruction/data memory and one In the Instruction ROM. 

5. The value of the Break on Translation Enabled (BTE) bit is equal to the 
complement of the Physical Addressing/Instructions (PI) bit in the Current 
Processor Status Register. This differentiates between a physical breakpoint 
address and a virtual breakpoint address. 

6. If address translation is enabled for instructions, the Breakpoint Process Identifier 
(BPID) field matches the PID field of the MMU Configuration Register, for a 
User-mode program. For a Supervisor-mode program, the BPID field must be 
zero. The BPID field allows the breakpoint to be associated with a particular 
process in a multi-tasking system. 

When a hardware breakpoint trap is taken, the processor sets the BHO bit. If the 
Trace trap handler returns to the routine with the breakpoint enabled, the BHO bit 
being 1 prevents the breakpoint comparison from causing another Trace trap. The 
processor resets the BHO bit when It encounters the breakpoint upon return, so that 
the Trace trap is once again enabled. 

A hardware-development system (see Section 5.4) can use the hardware breakpoints 
to cause the processor to enter the Halt mode rather than take a Trace trap. 

3.7.3 Debugging System-Level Routines 

The Monitor Mode provides a mechanism for debugging Interrupt handlers and other 
system-level routines using a software debugger. The processor can enter the Moni- 
tor mode without affecting the state of any running program, much as it can take an 
Interrupt without disturbing the state of an application program. 

When a Trace trap occurs, and the DA bit In the Current Processor Status Register Is 
1 , the processor takes a Monitor trap. The Instruction addresses of the trapped pro- 
gram are contained in the Shadow Program Counters, the cause of the trap is en- 
coded In the Reason Vector Register, and the Current Processor Status Register is 
unmodified (except that the Monitor Mode bit is set). This provides the information 
required to debug the trapped routine, regardless of whether the trapped routine was 
enabled to take interrupts. The Monitor trap handler can resume the execution of the 
trapped routine (e.g. for tracing) by executing an IRET or IRETINV instruction. 
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3.8 SERIALIZATION 

The Am29050 microprocessor overlaps external data references with other opera- 
tions, and typically performs floating-point operations in parallel with each other and 
with integer operations. When an external data reference must be restarted, however, 
or a floating-point operation causes a Floating-Point Exception trap to be taken, the 
processor context must be the same as when the operation was first attempted. To 
ensure this, certain operations are serialized. 

The processor serializes by entering the Pipeline Hold mode in any of the following 
circumstances: 

1 . An external access is not yet completed, and one of the following instructions is 
encountered: 

Move to Special Register 

Move to Special Register Immediate 

Move to TLB 

Interrupt Return 

Interrupt Return and Invalidate 

Halt 

2. An exte rnal access is not yet completed, and an interrupt or trap, other than a 
WARN trap, is taken. 

3. The processor detects that a floating-point instruction may cause an unmasked 
floating-point exception. In this case, the instruction is issued for execution, but 
the pipeline holds until execution of the Instruction is completed. 

Writes to certain registers — ^the Floating-Point Environment Register, the Integer 
Environment Register, the Floating-Point Status Register, and the Exception Opcode 
Register— could, if overlapped with arithmetic operations, change the context required 
or expected by the arithmetic operations, or could conflict with register updates 
caused by the operations. Writes to these registers are therefore serialized; that is, 
they are not performed until the completion of all operations performed by the floating- 
point unit. 

Similarly, reading the Floating-Point Status Register concurrently with the execution 
of a floating-point instruction might not obtain the status of previously issued instruc- 
tions. Therefore, reads of the Floating-Point Status Register are also serialized with 
floating-point operations. 

If the processor is in the Pipeline Hold mode due to serialization, it enters the Execut- 
ing mode once the external access or floating-point operation is completed. Note that 
the processor may immediately take a Data Access Exception, Coprocessor Excep- 
tion, or Floating-Point exception trap. 

3.9 INITIALIZATIOH 

When power is first applied to the processor, it Is in an unknown state, and must be 
placed in a known state. Also, under certain circumstances, it may be necessary to 
place the processor in a defined s tate. Th is is accomplished by the Reset mode, 
which is invoked by activating the RESET pin for at least four cycles. The Reset mode 
configures the processor state as follows: 

1 . Instruction execution is suspended. 

2. Instruction fetching is suspended. 

3. Any interrupt or trap conditions are ignored. 

4. The Current Processor Status Register is set as shown in Figure 3-55. 
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Figure 3-55 Current Processor Status Register In Reset iMode 
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Figure 3-56 Floating-Point Environment Register in Reset Mode 
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5. The Cache Disable bit of the Configuration Register is set. 

6. The Data Width Enable bit of the Configuration Register is reset. 

7. The Early Load Enable bit of the Configuration Register Is reset. 

8. The Floating-Point Environment Register is set as shown in Figure 3-56. 

9. The Integer Division Overflow Exception Mask and the Integer Multiplication 
Overflow Exception Mask bits of the Integer Environment Register are both set. 

10. The Contents Valid bit of the Channel Control Register is reset. 

Except as previously noted, the contents of all general-purpose registers, special-pur- 
pose registers, floating-point accumulator registers, and TLB registers are undefined. 
The contents of the Branch Target Cache memory are also undefined. 

The Reset mode also configures the processor to initiate an instruction fetch using an 
address of zero. Since the ROM enable (RE) bit of the Current Processor Status is 1, 
this fetch is directed to external instructi on read- only memory. This fetch occurs when 
the Reset mode is exited (i.e., when the RESET input is de-asserted). Section 5.5 
contains more information on this instruction fetch. 
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This chapter describes the operation of the Am29050 microprocessor pipeline, and 
the processor's three major functional units. The functional units are the Instruction 
Fetch Unit, the Execution Unit, and the Memory Management Unit. These units, which 
were shown in abstract form in Figure 2-2, are shown in detail In Figure 4-1 . 



Figure 4-1 Am29050 Microprocessor Data Flow 
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The operation of the functional units is coordinated by the Pipeline Hold mode, which 
insures that operations are performed In the proper order. This chapter also describes 
the Pipeline Hold mode. 

Since this chapter describes the internal operation of the Am29050 microprocessor, it 
provides information that may not be required by some users. However, it aids in 
understanding the behavior of the Am29050 microprocessor under certain conditions, 
especially the behavior of the system interfaces described in Chapter 5. 

4.1 FOUR-STAGE PIPELINE 

The Am29050 microprocessor implements a four-stage pipeline for Integer instruction 
execution. The four stages are fetch, decode, execute, and write-back. The execute 
stage of floating-point operations is pipelined to a depth determined by the latency of 
the operation. For either integer or floating-point operations, the pipeline is organized 
so that the effective instruction-execution rate may be as high as one instruction per 
cycle. 

During the fetch stage, the Instruction Fetch Unit (Section 4.2) determines the location 
of the next processor instruction, and Issues the instruction to the decode stage. The 
instruction Is fetched either from the Instruction Prefetch Buffer, the Branch Target 
Cache memory or an external instruction memory. 

During the decode stage, the Execution Unit (Section 4.3) decodes the Instruction 
selected during the fetch stage, and fetches and/or assembles the required operands. 
It also evaluates addresses for branches, loads, and stores. 

During the execute stage, the Execution Unit performs the operation specified by the 
instruction. In the case of branches, loads, and stores, the Memory Management Unit 
(Section 4.4) performs address translation if required. In the case of an early load, the 
physical address is transmitted to an external device or memory. The execution unit 
pipelines floating-point operations to a depth greater than one cycle, as described in 
Section 4.3.7. 

During the write-back stage, the results of the operation performed during the execute 
stage are stored. In the case of branches, loads, and stores, the physical address 
resulting from translation during the execute stage Is transmitted to an external device 
or memory, unless an early load occurs. 

Most pipeline dependencies that are internal to the processor are handled by fonA/ard- 
ing logic In the processor. For those dependencies that result from the external sys- 
tem, the Pipeline Hold mode insures proper operation. 

In a few special cases, the processor pipeline is exposed to software executing on the 
Am29050 microprocessor (see Section 7.4). 

4.2 INSTRUCTION FETCH UNIT 

The Instruction Fetch Unit performs the functions required to keep the processor 
pipeline supplied with instructions. Since the processor can execute one instruction 
per cycle. Instructions must be supplied at this rate if the execution stage Is to perform 
at the maximum rate. To accomplish this, the Instruction Fetch Unit contains mecha- 
nisms for requesting Instructions from Instruction memory before they are required for 
execution, and for caching the most-recently executed branch target Instructions. 

The Instruction Fetch Unit also incorporates the logic necessary to calculate and 
sequence Instruction addresses. The processor is word-oriented, but generates byte 
addresses for all external accesses. Since all processor instructions are word-length, 
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and are aligned on word-address boundaries, tlie Instruction Fetch Unit deals only 
with 30-bit addresses. For external Instruction accesses, these addresses are ap- 
pended with 00 in the two least-significant bits to form the required 32-bit address 
(note that the two least-significant bits of an external instruction address may not be 
00 for indirect jumps). 

4.2.1 Instruction Prefetch Buffer 

All instructions executed by the processor are fetched either from the Branch Target 
Cache memory or from external instruction memory (i.e., instruction/data memory or 
instruction read-only memory). When instructions are fetched from the external mem- 
ory, they are requested in advance to assist the timing of instruction accesses. The 
processor attempts to initiate the fetch for any given instruction at least four cycles 
before It is required for execution. 

Since instructions are requested In advance, based on a predicted need, it is pos- 
sible that a prefetched instruction is not required Immediately for execution when the 
prefetch completes. To accommodate this possibility, the Instruction Fetch Unit con- 
tains a four-word Instruction Prefetch Buffer (IPB), as shown In Figure 4-1. The IPB 
is a circularly addressed buffer which acts as a first-in/first-out (FIFO) queue for 
Instructions. 

If Instruction fetching is enabled, the processor requests an external Instruction fetch 
on any cycle for which the IPB contains an available location. Instructions are stored 
in the IPB as they are returned from the external instruction memory. An instruction Is 
stored into the IPB location whose number is given by bits 3-2 of the Instruction 
address. 

The Instruction Is held in the IPB until It Is required for execution. When required, 
the instruction is sent to the decode stage, and the IPB location is freed to receive a 
subsequent instruction. 

4.2.1.1 INSTRUCTION PREFETCH STREAM 

An Instruction prefetch stream is established whenever the processor performs a non- 
sequential instruction reference. Non-sequential references normally occur as the 
result of successful b ranche s, but may also result either from the taking of an interrupt 
or trap (including the WARN trap) or from an Interrupt return. 

A non-sequential Instruction fetch is initiated by placing an instruction-fetch request on 
the Address Bus. Once the external instruction fetch has been initiated, the processor 
generates prefetches for subsequent instructions based on the availability of IPB 
locations, either by transmitting subsequent addresses, or by Issuing burst-mode 
Instruction requests. 

The addresses for prefetched instructions are computed by a word-length register 
called the Instruction Fetch Pointer (IFP), which is maintained by the Instruction Fetch 
Unit. The IFP latches the physical instruction-address obtained from the Memory 
Management Unit whenever a non-sequential instruction reference occurs. Then, for 
instruction prefetches, an 8-bit incrementer associated with the IFP updates bits 9-2 
of the IFP to point to sequential Instructions In the prefetch stream. The Incrementer is 
limited to eight bits because it Increments physical addresses, and thus cannot Incre- 
ment beyond any possible virtual-page boundaries (recall that the minimum virtual 
page size Is 1 kb). If the incrementer overflows, as Indicated by a carry-out, prefetch- 
ing is preempted. The prefetch stream is later re-established as described below. 

The physical address In the IFP is always the address of the most-recently prefetched 
instruction, even though this address may not appear on the Address Bus for 
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burst-mode fetches. If the burst is externally preempted, the IFP is used to re- 
establish the burst at the point of preemption. 

4.2.1 .2 INSTRUCTION PREFETCH BUFFER STATES 

Four states are associated with each Instruction Prefetch Buffer location. The state- 
transition diagram for these states is shown in Figure 4-2. 



Figure 4-2 
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Available — The IPB location is free for a new fetch. It contains no valid instruction, 
and is not due to receive any requested instruction. 

Allocated— The IPB location has been scheduled to receive a requested instruction, 
which has not yet been returned from the external instruction memory. 

Valid— The IPB location contains a valid instruction. 

Error— The IPB locati on contains an instruction which was returned from the external 
memory with an lERR indication. 

If all internal conditions are such that an instruction fetch can occur, the IPB location 
given by bits 3-2 of the instruction address is set to the Allocated state, and the in- 
struction is requested externally. Once this instruction is returned to the processor, it 
is stored in the IPB location. The location is set to the Valid or Error state (based on 
the lERR input), unless the instruction Is sent Immediately to the decode stage, In 
which case the buffer Is set to the Available state. 

The instruction remains in the buffer until it is required for execution. When the in- 
struction is required, it is issued to the decode stage, and the IPB location is set to the 
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Available state. If the buffer were in the Error state, It is still set to the Available state, 
but an Instruction Access Exception trap occurs. 

It is possible for all IPB locations to be in the Available or Valid states, but only one is 
allowed to be in the Allocated state at any given time. This restricts the number of 
unsatisfied instruction prefetches to one, reducing the amount of logic required to 
keep track of external fetches. It additionally restricts the number of apparent pipeline 
stages in the external prefetch mechanism to one stage (the other stages involved in 
the four-stage prefetch pipeline are the request stage and the processor's fetch and 
decode stages). Larger external prefetch pipelines may be implemented, but they are 
required to appear as single-stage pipelines; at most, one instruction can be returned 
to the processor from the old instruction prefetch stream after a non-sequential fetch 
occurs. 

When a non-sequential fetch occurs, all buffer locations are set to the Available state 
during the execute stage of the non-sequential fetch. All instruction requesting for the 
previous prefetch stream is terminated at this time. There is at most one instruction 
that will be returned to the processor after instruction fetches are terminated; this 
instruction is returned before any instruction associated with the new instruction 
stream is requested externally. 



The Error state is provided only to handle errors reported via the I ERR input. How- 
ever, there are many other situations in which the IPB does not contain a valid in- 
struction. These situations arise because of errors, such as memory-management 
protection violations, and because instruction fetching is sometimes preempted, such 
as is the case when the IFF adder overflows. All of these cases are indicated by the 
fact that the IFB location is in the Available state when the instruction Is required for 
execution (note that the location should, normally, at least be in the Allocated state 
when the instruction is required). 

If the processor requires an instruction from an IFB location that is in the Available 
state, it initiates the fetch for the instruction using the current value of the Program 
Counter. This fetch resolves the exceptional condition. It either performs an address 
translation with the proper address, eliminating page-boundary-crossing problems, 
or re-creates an error condition, in which case a trap occurs. 

4.2.2 Branch Target Cache Memory 

The Branch Target Cache memory on the Am29050 microprocessor allows fast ac- 
cess to instructions fetched non-sequentially. A branch instruction may execute in a 
single cycle, if the branch target is in the Branch Target Cache memory. 

The target of a non-sequential fetch is in the Branch Target Cache memory if a similar 
fetch to the same target has occurred recently enough that it has neither been re- 
placed by the target of another non-sequential fetch, nor invalidated by an INV or 
IRETINV instruction. 

4.2.2.1 BRANCH TARGET CACHE MEMORY ORGANIZATION 

The Branch Target Cache memory (BTC) is a 1-kb storage array which contains 
blocks of instructions from recently taken branches. To improve the proportion of 
successful searches in the BTC memory, it is organized as a two-way set-associative 
memory. Each set contains 128, 32-bit words (each instruction occupies one word). 
The sets are divided into blocks of either four Instructions each or two instructions 
each, depending on the value of the Branch Target Cache memory organization (CO) 
bit of the Configuration Register. Blocks which lie in different sets but have the same 
block number constitute a unit called a line. Figure 4-3 shows the organization of the 
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BTC memory when the CO bit is 0. Figure 4-4 shows the organization of the BTC 
memory when the CO bit is 1 . 



Figure 4-3 
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A 29-bit cache tag is associated with each blocl<. Of the 29 bits, 26 are derived from 
the address (possibly virtual) of the instructions in the blocl< and are called the 
Address Tag. 

Note that the Address Tag is 26 bits in length, rather than 24 bits as might be implied 
by the organization of the Branch Target Cache memory. The reason for this is that 
branch target Instruction sequences are aligned on cache-block boundaries, and 
cache blocks are not aligned with respect to memory addresses. Thus, more bits are 
required in the Address Tag than would be required if cache locations were mapped 
one-to-one to memory locations. 

Three additional bits In the cache tag, called the Space Identification field (Space ID), 
indicate the Instruction memory from which the instructions were fetched (instruction/ 
data or read-only memory), whether the instructions were fetched from a virtual or 
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Figure 4-4 Branch Target Cache Memory Organization (CO = 1) 







SetO 

Block 










Set1 

Block 


Valid Space ID 


Address Tag 


Target Instruction 





Valid 


Space ID 


Address Tag 


Target Instruction 




Target +1 


Line 




Target + 1 




Block 1 




Block 1 


1 




Line1 


1 














Block 2 




Block 2 


1 




Line 2 


1 














Block 3 




Block 3 


1 




Line 3 


1 















Block 63 



Block 63 



Line 63 



physical address space, and the program mode under which the instructions were 
fetched (Supervisor or User). When instructions are placed Into the Branch Target 
Cache memory, the Space ID bits are written with the values of the following bits of 
the Current Processor Status Register: ROM Enable (RE), Physical Addressing/In- 
structions (PI), and Supervisor Mode (SM). 

There are four valid bits per block, corresponding to the four words available per block 
when the CO bit of the Configuration Register is 0. Cache Invalidation instructions 
make it possible to reset all Valid Bits in a single processor cycle. However, for the 
Invalidate instruction, the Valid bits are not reset until the next branch is executed. 

4.2.2.2 BRANCH TARGET CACHE MEMORY OPERATION 

It is possible to disable the operation of the Branch Target Cache memory via the 
Branch Target Cache Memory Disable (CD) bit of the Configuration Register. If the 
CD bit is 1 , all Branch Target Cache entries are made to appear invalid. If the CD bit 
is 0, there is no effect on Branch Target Cache memory entries. However, note that a 
change in the CD bit does not take effect until after the next non-sequential instruction 
fetch occurs. 

When the Branch Target Cache memory is disabled, it continues to operate as de- 
scribed in this section. However, entries are made to appear invalid, even though they 
may be valid. If the Branch Target Cache memory is enabled after a period of being 
disabled, its contents reflect the most recent instruction execution, and it operates 
accordingly. 
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The Branch Target Cache memory lookup process is diagrammed in Figure 4-5. A 
given branch target sequence may be contained in one of two cache blocks, where 
these blocks are in the same line. The sequence is contained in the line whose num- 
ber is given by bits 5-2 of the address of the first instruction of the sequence. A given 
branch target sequence is in a given cache block only if the following conditions are 
met: 

1 . Bits 31-6 of the address for the first instruction in the sequence match the 
corresponding bits in the Address Tag associated with the block. 

2. The address of the first instruction in the block has a valid translation in the 
Memory Management Unit, if it is a virtual address. 

3. The instruction address space indicated by the Current Processor Status Register 
(RE, PI, and SM bits) matches the address space indicated by the Space ID field. 

4. The CD bit of the Configuration Register was for the previous non-sequential 
instruction fetch. Note that it is not required that all instructions in the sequence be 
present in the cache for the block to be considered valid. 

In addition to the above requirements, the Valid bit must be 1 for any instruction re- 
trieved from the cache. 

Whenever a non-sequential fetch occurs (either for a branch instruction, an interrupt 
or a trap), the address for the fetch is presented to the Branch Target Cache memory 
at the same time that the address is translated by the Memory Management Unit. If 
the target instruction for the non-sequential fetch is in the cache, it is presented for 
decoding in the next cycle. This instruction is always the first Instruction of the cache 
block, and its address matches the cache tag. Subsequent instructions in the cache 
are presented for decoding as required in subsequent cycles. However, their ad- 
dresses do not necessarily match the Address Tag. 

4.2.2.3 BRANCH TARGET CACHE MEMORY REPLACEMENT 

On a non-sequential fetch, if the target instruction is not found in the Branch Target 
Cache memory, the address of the fetch selects a line to be used to store the instruc- 
tion sequence of the new branch target. The replacement block within the line Is 
selected at random, based on the processor clock. Random replacement has 
slightly better performance than least-recently used replacement, and has a simpler 
implementation. 

To replace the selected entry, all Valid bits associated with the entry are reset, the 
Address Tag is set with the appropriate address bits of the first instruction in the new 
sequence, and the Space ID bits are set according to the Current Processor Status 
Register. Instructions from the new fetch stream are stored Into the selected cache 
block as they are issued to the decode stage. The first instruction is stored into the 
first word of the block, the second instruction is stored into the second word, and so 
on up to a maximum of four instructions. The Valid bit for each word is set as the 
instruction is stored. 

4.2.2.4 SPECIAL CASES OF BRANCH TARGET CACHE MEMORY ENTRIES 

If a branch instruction appears as one of the first two instructions in a branch target 
sequence, the branch is executed before the Branch Target Cache memory block Is 
filled. In this case, the cache block contains less than four valid instructions. The final 
valid instruction is the delay instruction of the branch. 

When a block Is only partially filled due to a branch within the block, the behavior of 
the cache during subsequent executions of the instructions In the block depends on 
the outcome of this branch. 
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If the branch is subsequently successful, then the instructions following the delay 
instruction of the branch are not needed, and the fact that they are not contained in 
the cache is irrelevant. 

If the branch is subsequently unsuccessful, then the instructions following the delay 
instruction are required, and must be fetched externally. In this case, a required entry 
has a Valid bit of 0. When the Invalid entry is encountered, the Program Counter is 
used to create an external instruction fetch for the missing instruction; this fetch is 
called a demand fetch. When the fetch completes, the instruction is stored in the 
cache location that was previously invalid, and the Valid bit for this entry is set. 

Since an instruction sequence in a four-word (or two-word) cache block is not neces- 
sarily aligned on a four-word (respectively, two-word) address boundary, a virtual- 
page address boundary may be crossed for the sequence in the cache. The proces- 
sor does not prefetch instructions beyond this boundary, so the cache block is only 
partially filled in this case. If the processor requires instructions beyond the boundary, 
it creates a fetch for them as described above for the case of a branch instruction in 
the cache block. 

When a fetch is created for a page-boundary crossing, this fetch is treated as a non- 
sequential fetch; a new cache block is allocated, and the first instructions at the 



Figure 4-5 Branch Target Cache Memory Lookup Process (CO = 0} 
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boundary are placed into the new cache block as they are returned by the instruction 
memory. Subsequent references to the original cache block also encounter an invalid 
instruction at the page boundary, and also create a special fetch for this instruction. 
However, since the Instructions beyond this boundary are In the Branch Target Cache 
memory, subsequent boundary crossings do not incur the instruction-fetch latency. 

4.2.3 Non-Sequential Instruction Fetclies 

When a non-sequential instruction fetch occurs, the Memory Management Unit per- 
forms an address translation for target instruction, if address translation is enabled. If 
the address translation is valid, and the target of the fetch is not in the Branch Target 
Cache memory, an external instruction fetch is initiated. If there is a Translation Look- 
Aside Buffer (TLB) miss or memory-protection violation on this address, fetching is 
not initiated. 

4.2.3.1 INSTRUCTION FETCH-AHEAD 

When a non-sequential fetch occurs, if the target of the fetch is found in the Branch 
Target Cache memory, the processor normally begins fetching instructions beyond 
the valid instructions in the target block. This behavior is termed fetch-ahead. The 
valid bits of the target block and the CO bit of the Configuration Register determine 
the address of the request (A is the address of the target Instruction): 
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The computation required to obtain the address for the fetch-ahead is performed In 
parallel with address translation, by a 6-blt adder called the Fetch-Ahead Adder (see 
Figure 4-1). 

The Fetch-Ahead Adder can overflow during the address computation for the fetch- 
ahead, as indicated by a carry out of the Fetch-Ahead Adder. Here, a page boundary 
may have been crossed, making the address translation— which Is performed concur- 
rently—invalid. In this case, fetch-ahead is not initiated. 

If fetch-ahead Is not initiated for an Instruction that the processor eventually requires, 
this fetch is restarted on the cycle in which the missing instruction is required, using a 
demand fetch. The Program Counter is used, guaranteeing that the proper instruction 
address is used. 

The Program Counter Unit, shown in Figure 4-6, forms and sequences instruction 
addresses for the Instruction Fetch Unit. It contains the Program Counter (PC), the 
Program-Counter Multiplexer (PC MUX), the Return Address Latch, and the Program- 
Counter Buffer (PC Buffer). 

The PC forms addresses for sequential instructions executed by the processor. The 
master of the PC Register, PC LI , contains the address of the instruction being 
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Figure 4-6 
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fetched in the Instruction Fetch Unit. The slave of the PC Register, PC L2, contains 
the next sequential address, which may be fetched by the Instruction Fetch Unit in the 
next cycle. 

The Return Address Latch passes the address of the instruction following the delayed 
Instruction of a call to the register file. This address is the return address of the call. 

The PC Buffer stores the addresses of instructions in various stages of execution 
when an interrupt or trap Is taken. The registers in this buffer— Program Counters 0, 
1 , and 2 (PCO, PC1 , and PC2) and Shadow Program Counters 0, 1 , and 2 (SPCO, 
SPC1 , and SPC2)— are normally updated from the PC as instructions flow through 
the processor pipeline. 

When an interrupt or trap is taken, the Freeze (FZ) bit in the Current Processor Status 
is set, holding the quantities in the PC Buffer. When the FZ bit is set, PCO, PC1 , and 
PC2 contain the addresses of the instructions in the decode, execute, and write-back 
stages of the pipeline, respectively. The Shadow Program Counters continue to oper- 
ate and continue to update from the PC unless a Monitor trap occurs. 
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Upon the execution of an Interrupt return, the target instruction stream is restarted 
using the instruction addresses in PCO and PC1 (or SPCO and SPC1 , upon return 
from a IVIonitor trap). Two registers are required here because the processor imple- 
ments delayed branches. An interrupt or trap may be taken when the processor is 
executing the delay instruction of a branch and decoding the target of the branch. 
This discontinuous Instruction sequence must be restarted properly upon an interrupt 
return. Restarting the Instruction pipeline using two separate registers correctly han- 
dles this special case; In this case PC1 (or SPC1) points to the delay instruction of the 
branch, and PCO (or SPCO) points to its target. PC2 (SPC2) does not participate in 
the interrupt return, but is Included to report the addresses of Instructions causing 
certain exceptions. 

The PC is not defined as a special-purpose register. It cannot be modified or In- 
spected by instructions. Instead, the interrupting and restarting of the pipeline is done 
by the PC Buffer registers PCO and PC1 or SPCO and SPC1 . 

4.3 EXECUTION UNIT 

The Execution Unit performs most of the operations required for Instruction execution. 
It Incorporates the Register File, the Address Unit, the Arithmetic/Logic Unit, the Field 
Shift Unit, the Prioritlzer and the Floating-Point Unit. 

4.3.1 Register File 

The general-purpose registers are Implemented by a four-port, 192-locatlon Register 
File. The Register File performs two read accesses and two write accesses in a single 
cycle. If a location is written and read In the same cycle, the data read Is that written 
during the cycle. 

The Register Address Generator, shown in Figure 4-7, computes register numbers for 
operands, detects pipeline data dependencies, and calculates register-number se- 
quences for load-multiple and store-multiple operations. 

4.3.1.1 REGISTER ADDRESSING 

Register numbers for Instruction operands are computed during the decode stage. 
This computation is performed during the first half of a cycle, and the operands are 
read in the second half of a cycle. Three multiplexers select two source-operand 
register numbers and a single destination register number for any given instruction. 

If the most-significant bit of a register number Is 0, the global registers are selected, 
and the register number Is used directly as a register address. If the most-significant 
bit of the register number Is 1 , the local registers are selected, and the lower seven 
bits of the register number are added to the Stack Pointer to form the desired local 
register address. 

The Stack Pointer Is a hardware shadow-copy of bits 8-2 of Global Register 1 , and Is 
updated whenever Global Register 1 Is written with the result of an Arithmetic or 
Logical instruction. Global Register 1 Is implemented as a full 32-bit register in the 

Qonlctor Pilp- thie ronictor ie Hietinr't irr\m the* 1 QO lonotione th^* irnnlompnf nonprpL 

|J>UI^V/OC7 I Opioid O. 

If a register number is zero (i.e., if Global Register Is specified as an operand), the 
Register Address Generator selects the content of an indirect pointer as the register 
number. There are three indirect pointers, and each appears as a special-purpose 
register. 
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Figure 4-7 Register File and Register Address Generator 
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4.3.1.2 PIPELINE DATA DEPENDENCIES 

For the Register File, the pipeline delay in result write-back, conripared to operand 
access, creates situations where a result from a previous operation may be required 
as an operand before it has been written into the register file. When one of these 
situations arises, a pipeline data dependency is said to exist. 

The register numbers for the write-back of instruction results require two buffering 
registers, so that they are presented to the Register File during the write-back stage. 
In addition, the register numbers for uncompleted load operations are held until the 
load completes (these register numbers are held in the ETR Register shown in 
Figure 4-6). 

Register read-address comparators detect pipeline data dependencies, and activate 
multiplexers to fon^/ard data directly to the required functional unit, without waiting for 
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the data to be written to the register file. The comparators activate the forwarding 
multiplexers if they detect one of the following situations: 

1 . One of the source register numbers matches the destination register number of 
the Immediately previous Instruction. 

2. One of the source register numbers matches the target register number (in the 
ETR) of an outstanding load. 

In the first case listed above, the result of the execute stage is selected as an oper- 
and, instead of the output of the Register File port for which the fonA/ardIng condition 
is detected. In the second case, data from the channel is selected. The comparison 
may cause the processor to enter the Pipeline Hold mode if the load has not com- 
pleted. However, data fonA/arding allows data from the Data Bus to be used immedi- 
ately, in the cycle after it is returned on the Data Bus. 

The content of the ETR is further compared to the register numbers supplied to the 
write-back stage. If the target register for a load Is written with the result of an over- 
lapped instruction, the Not Needed (NN) bit in the Channel Control Register is set. If 
the comparators determine that the NN bit should be set, they also inhibit the write- 
back of load data on the completion of the load. The NN bit inhibits the restarting of 
the load operation if an exception occurs. 

The Am29050 microprocessor Floating-Point Unit contains hardware comparable to 
that described above for detecting dependencies on floating-point operations to for- 
ward data, cause a pipeline hold, or prevent the write-back of a floating-point opera- 
tion, as required. The Floating-Point Unit also manages write-back register numbers, 
and presents the register number of a result to the register file at the appropriate time. 

4.3.1.3 LOAD-MULTIPLE AND STORE-MULTIPLE SEQUENCES 

During load-multiple and store-multiple operations, sequential register numbers are 
computed by an incrementer associated with the ETR/DTR pair shown in Figure 4-7. 
In the case of store multiple, the register numbers are supplied as read addresses to 
the Register File by the incrementer. The read addresses are latched by the DTR so 
that they may be incremented further. In the case of load multiple, target register 
numbers are held by the ETR as for any other load. However, the ETR is set with a 
sequence of incremented addresses In this case. 

4.3.2 Address Unit 

The Address Unit, shown In Figure 4-8, computes addresses for branch target Instruc- 
tions, and load-multiple and store-multiple sequences. It also assembles instruction- 
Immediate data and creates addresses for restarting terminated instruction prefetch 
streams. 

The Address Unit consists of a 30-bit adder, the Decode PC Register, the ADRF 
Latch, and logic for formatting instruction-immediate data and generating the con- 
stants zero and one. The Decode PC Register holds the address of the instruction in 
the decode stage of the pipeline. 

4.3.2.1 BRANCH TARGET ADDRESSES 

Branch target addresses are either fetched from the Register File or calculated by the 
Address Unit. The Address Unit calculates target addresses during the decode stage 
of branch instructions. These addresses are of two possible types: 

1 . PC Relative: the current PC value is added to a sign-extended, 1 6-bit offset field 
from the branch instruction. 
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Figure 4-8 Address Unit 
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4.3.2.2 



2. Absolute: a zero-extended, 1 6-bit field of the branch instruction is used directly as 
an instruction address. 

For each of the above types of addresses, the 16-bit instruction field is aligned on a 
word address-boundary (i.e., it is shifted left by two bits). 

To calculate the branch target address, the Address Unit formats the 16-bit instruction 
field as required and presents it to the 30-bit adder. This adder adds the formatted 
field either to the contents of the Decode PC Register or to zero, as required for PC- 
relative or absolute addresses, respectively. 

LOAD-MULTIPLE AND STORE-MULTIPLE ADDRESSES 

During the execution of Load Multiple and Store Multiple instructions, addresses for 
the access sequence are held in the ADRF Latch. An address in the ADRF Latch is 
updated, as required for an access in the sequence, by the 30-bit adder In the 
Address Unit. The formatting logic creates a constant offset of one for the update. 
The updated address is presented to the Memory Management Unit for translation 
and protection checking, and is placed into the ADRF Latch for further address 
computations. 
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For load-multiple and store-multiple operations performed using burst-mode ac- 
cesses, the physical address for each access does not appear on the Address Bus, 
but the addresses are maintained in the processor so that they may be used to restart 
the burst-mode access upon preemption. 

4.3.2.3 SPECIAL INSTRUCTION FETCHES 

As discussed in Section 4.2, the processor must create demand fetches when it en- 
counters an invalid instruction in the middle of a Branch Target Cache memory block, 
or when it attempts to fetch an instruction from an Instruction Prefetch Buffer location 
which is in the Available state. The Address Unit routes the address for this fetch In a 
manner similar to the routing of a branch target address. It passes the contents of the 
Decode PC (containing the required instruction address) through the 30-bit adder, 
adding it to zero. This address is presented to the Memory Management Unit for 
translation, and is used in the Instruction Fetch Unit to complete the fetch. 

4.3.3 Early Loads 

The early load feature speeds up the execution of load operations by making the 
physical address of the load available at the end of the decode cycle of the load 
instruction. At the beginning of the next cycle, when the load enters the execute 
stage, the physical address appears on the channel. In effect, early loads reduce the 
memory access time by one cycle. 

Early loads can occur in two different ways. Either the physical address of the load is 
available in the Physical Address Cache memory (PAC), or, when an address compu- 
tation Immediately precedes the load Instruction, the computed physical address can 
be forwarded directly to the channel. The latter method is performed by an Early 
Address Generator (EAG). 

For either type of early load to occur, all of the following conditions must be met: 

1 . The operation must be a LOAD, with a general-purpose register, rather than a 
constant, specified as an address source operand. 

2. The operation must load the external word addressed by the source register, 
rather than transfer a word from a coprocessor. 

3. The source register can be neither the IPB specifier nor the Stack Pointer. 

4. The load instruction must not disable address translation for the access (PA = 0). 
In other words, address translation must remain under the control of the PD bit of 
the Current Processor Status Register. 

5. The load Instruction must not force the access to be made in the User mode 
(UA = 0). The program mode must remain under the control of the SM bit of the 
Current Processor Status Register. 

4.3.3.1 PHYSICAL ADDRESS CACHE MEMORY 

The PAC is a four-entry, direct-mapped cache. Each PAC entry consists of two words. 
PAC entries cannot be accessed by software. The first word (Word 0) is the Trans- 
lated Physical Address, while the second word (Word 1) contains a Register Tag 
and various control bits. The PAC entry registers are Illustrated in Figure 4-9 and 
Figure 4-10. 

PAC Entry Word contains the 32-bit physical address of the load. The valid (V) bit of 
PAC Entry Word 1 is 1 if the physical address is a valid translation. The 10 bit is set 
equal to the 10 bit of the TLB or RMU translation of the address, if address translation 
is in effect. Otherwise, the 10 bit In the PAC entry Is ignored. The Register Tag field 
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Figure 4-9 
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Figure 4-10 PAC Entry Word 1 
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contains the number of the register that holds the memory address of the load; its 
value is tal<en directly from the RB field of the load instruction. 

The value of the PGM field is taken from the PGM field of the TLB or RMU translation 
of the address, if address translation is in effect. Othen^^ise, the PGM field in the PAC 
entry contains zeros. 

The PAC supports the following operations: 

• Searching for a valid translation for the load in the decode stage. 

• Invalidating by clearing all Valid bits. 

• Invalidating a single entry by clearing its Valid bit. 

• Updating an existing entry by modifying its Translated Physical Address field and 
setting Its Valid bit. 

• Replacing an existing entry with a new entry and setting its Valid bit. 

When a load is in decode, the PAC is searched for a valid entry corresponding to the 
memory address register of the load (specified by the RB field). The PAC entry is 
selected by the two least significant bits of the RB field of the load instruction. If a 
valid translation for the load is found in the PAC, a PAC hit results and the physical 
address from the PAC is used for the access. This address is available one cycle 
earlier than if the address translation were to wait until the execute stage, and an 
early load occurs. 

In the case of a PAC miss, the newly translated physical address is written to the 
PAC, replacing an existing PAC entry. Only load instructions can replace PAC entries. 
This address is then available for subsequent instructions that use the same address 
register. 

An individual PAC entry is invalidated if any instruction modifies the register whose 
translated address is cached in the PAC entry. 

The entire PAC is Invalidated if any of the following occurs: 



• The RESET or WARN input is asserted. 

• The Stack Pointer (GR1) is modified. 
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• The processor executes an MTSR instruction whose destination is the IVIIVIU 
Configuration Register, the Current Processor Status Register, or a Region 
IVIapping Unit register. 

• The processor takes a trap. 

• The processor executes an IRET, IRETINV, MTTLB, or LOADIVI instruction. 

• The processor executes an instruction which updates the Register File using an 
indirect pointer. 

4.3.3.2 EARLY ADDRESS GENERATOR 

When a load is being decoded. Its address can be translated early if the instruction in 
the execute stage is connputing the associated address. An Early Address Generator 
(EAG) Is constantly translating the results of certain instructions during execution. If a 
load happens to refer to an address being connputed and translated by the EAG, the 
translated address is available for use at the beginning of the execute stage of the 
load. In this case, an early load occurs. 

Because the EAG must compute and translate an address in a single cycle, it only 
operates on a simple subset of instructions: CONST, CONSTH, ADD, ADDS, and 
ADDU. These instructions, though simple, are frequently used to compute load ad- 
dresses. For the add instructions, the EAG cannot translate the address if the add 
causes the input values to cross a page boundary— for example, if there is a carry-out 
of bit 1 1 for 4-kb pages. The page boundary depends on the page size, and, for ad- 
dresses translated by a Region Mapping Unit, the page size is treated as 64 kb (the 
minimum region size). 

If the EAG computes and translates an address for an instruction whose destination 
register is mapped by the PAC, the PAC entry is updated with the new translation 
whether or not there is a load in decode and whether or not the Valid bit is set for the 
PAC entry. This allows the PAC to be updated for common addressing patterns, such 
as incremented addresses, and increases the effectiveness of the PAC. However, this 
update can occur only if there is not a load using another PAC entry. If there is such a 
load, the entry associated with the EAG destination is invalidated. 

4.3.3.3 INHIBITION OF EARLY LOADS 

Early loads cause contention for the Address Bus between instruction and data ad- 
dresses when a jump or call appears immediately before a load instruction. In this 
case, the jump instruction uses the Address Bus during the execute stage of the load 
instruction, and the early load is inhibited. 

Early loads are also inhibited If a trap or interrupt is taken during the decode stage of 
the load. 

4.3.4 Arithmetic/Logic Unit 

The Arithmetic/Logic Unit (ALU) performs 32-bit arithmetic and logical operations. The 
arithmetic operations consist of addition, subtraction, addition with carry-in, subtrac- 
tion with carry-in, and primitives for multiplication and division. Instructions specify 
whether or not a trap is generated on signed or unsigned arithmetic overflow. 

The A and B operands may be complemented independently in the ALU; complemen- 
tors for data into the ALU are controlled by instructions. This allows subtraction and 
reverse subtraction to be formed from addition, and allows certain logical operations 
(e.g., XNOR) to be formed from other basic operations (e.g., XOR). The carry-in to 
the ALU can be 0, 1 , or the value of the Carry bit in the ALU Status Register. The 
carry-out of the ALU Is used in overflow detection, unsigned comparisons, multiplica- 
tion, and division. The ALU carry-out is stored In the ALU Status Register for multi- 
precision arithmetic. 
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The ALU also evaluates the relational expressions equal to, not equal to, less-than, 
less-than-or-equal-to, greater-than, and greater-than-or-equal-to. Each comparison 
computes a Boolean corresponding to the relation between two integer operands or 
creates a trap (possibly) based on this relation. The Boolean constants FALSE and 
TRUE are represented by a and 1 , respectively, in the most-significant bit of a word. 

The relational operators may be applied to either signed or unsigned operands. For 
unsigned operands, these operators are implemented by recognizing that the ALU 
carry-out Is the Boolean result of an unsigned comparison if the two numbers are 
subtracted and the carry-in is appropriately controlled. For comparison of signed 
numbers, the true sign of the result (i.e., the resulting sign excluslve-ORed with the 
overflow Indication) gives the result of the compare. The relational operators equal-to 
and not-equal-to are independent of the data type. These operators are implemented 
by a 32-bit equal-to-zero comparator. 

The ALU also supports the 32-bit logical operations AND, OR. NAND. NOR, AND- 
NOT, OR-NOT, XOR, and XNOR. 



4.3.5 Field Shift Unit 



The Field Shift Unit contains a Funnel Shifter, logic for performing word extracts, and 
logic for performing byte and half-word extracts and inserts. 

The Funnel Shifter performs N-bit shifts, where N i^ an integer between and 31, 
inclusive, given by a 5-bit shift count. The source of the shift count is specified by the 
shift instruction; the shift count is given either by a constant field in the shift instruc- 
tion, bits 4-0 of a general-purpose register specified by the shift instruction, or by the 
5-bit Funnel Shift Count field In the ALU Status Register. 

Both arithmetic and logical shifts are supported, with the difference being the values 
, stored into vacated bits: arithmetic shifts fill these bits with the sign bit of the operand, 
while logical shifts fill them with zero-bits. Arithmetic shifts are possible only for right 
shifts. 

The Field Shift Unit operates on 32-blt words, 16-bit half-words, and 8-bit bytes. For 
byte operations, the position of a byte operand within a word is supplied by the 2-bit 
Byte Pointer (BP) field of the ALU Status Register. For half-word operations, the 
position of a half-word operand is given by the most-significant bit of the BP field; the 
least-significant bit is ignored. The processor supports either left-to-right or right-to-left 
byte and half-word ordering within a word. 

4.3.6 Prioritizer 

The prioritizer counts the number of leading zero-bits in an operand. The count of the 
number of zero-bits up to the leading 1 is stored in the specified destination register. If 
the operand does not contain a 1 , the value stored Is 32. 

4.3.7 Floating-Point Unit 

The Am29050 microprocessor Floating-Point Unit (FPU) has separate addition/ 
subtraction, multiplication, and division/square root pipelines, all of which share a 
common rounding circuit. A block diagram of the Floating-Point Unit is shown in 
Figure 4-11. 

The FPU contains eight functional units: 

Classifier (CL)-— Determines operand type for the CLASS instruction. 
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Figure 4-1 1 
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Denormalizer (DN)-— Equalizes the exponent values of two floating-point operands by 
right-shifting the signiflcand of the smaller operand. 

Adder (AD)— Adds and subtracts the significands of floating-point operands. 

Renormalizer (RN)— Normalizes the result of a floating-point operation by left-shifting 
the result's signiflcand until the most significant bit is 1 , or until the exponent Is 0. 

Multiplier (MT)— Performs a 32-bit by 32-bit multiplication, producing a 64-bit result in 
redundant (sum/carry) form. The multiplier performs both floating-point and integer 
multiplications. 

Partial Product Summer (PS)— Converts the redundant multiplier output to binary 
form. Also sums four successive multiplier outputs to form the intermediate result of a 
double-precision multiplication. 

Divide/Square Root Unit (DS)— Interactively computes floating-point divisions and 
square roots. 

Round Unit (RU)— Rounds an Intermediate result to fit the destination format. The 
round unit is also responsible for processing exceptions that occur at the end of an 
operation. 
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Table 4-1 



Table 4-1 shows the functional units used by each operation, and the order in which 
they are used. As indicated in the table, some operations may require an additional 
cycle for certain types of data inputs: 

Staging off Floating-Point Operations 
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Notes: * = Denormalized source operands or results that need to be denormalized will require additional cycles for 
operand wrapping or unwrapping (see Table C-3). 

= Optional sequencing (see text). 

• When the CONVERT or MTACC instruction is used to convert a denormalized 
single-precision floating-point number to double-precision, an additional cycle is 
used to normalize the operand. 

• When a DADD or FADD instruction receives operands having the potential for 
massive cancellation— i.e., whose exponent values differ by or 1 , and whose 
signs are different — ^an additional cycle is used to re-normalize the intermediate 
operand. 

• When a DSUB or FSUB instruction receives operands having the potential for 
massive cancellation — i.e., whose exponent values differ by or 1 , and whose 
signs are the same— an additional cycle is used to renormalize the Intermediate 
operand. 
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The sequencing shown in the table does not apply to the DDIV, DMUL, FDIV, 
FDMUL, FMUL, and SORT operations if one or more of the input operands is denor- 
malized, or if the result is denormalized; additional cycles are required for wrapping 
and/or unwrapping operands (see Table C-3). 

The Floating-Point Unit can support multiple operations concurrently, the principal 
limitation being resource contention. The resources capable of causing contention are 
the 32-by-32 Multiplier (MT), the Divide/Square Root Units (DS), and the Round Unit 
(RU). 

The 32-by-32 Multiplier and the Divide/Square Root unit can give rise to contention 
because they may be allocated to a single operation for multiple cycles. If one of 
these resources is busy when required by a subsequent instruction, a pipeline hold 
results until the needed resource is free. 

While contention for the Round Unit is not common, there are situations in which two 
or more functional units have data for the Round Unit at the same time. In general, 
data from the earliest-Issued contending operation has the highest priority for access 
to RU. Priorities, listed from highest to lowest, are: 

1 . The Renormalizer (RN) result, if the RU contains a result that needs to be 
unwrapped (e.g., a denormalized number in normalized form), and the 
Denormalizer (DN) and Adder (AD) Units are also busy. Allowing the RN result to 
go to RU allows the wrapped result in RU to be fon^^arded to DN. 

2. The Divide/Square Root Unit (DS). 

3. The Partial Product Summer (PS), if It has a result that has been waiting for at 
least one cycle. 

4. The Renormalizer (RN), if It has a result that has been waiting for at least one 
cycle. 

5. The Partial Product Summer (PS). 

6. The Renormalizer (RN). 

7. The Adder (AD). 

As with loads, floating-point operations are fully interlocked; an operation requiring 
the result of a previous operation is prevented from proceeding until that result is 
available. 

In a single cycle, the Register File can transfer to the Floating-Point Unit: 

• One or two double-precision floating-point operands, each of which originates in a 
double-word-aligned register pair. 

• One or two integer or single-precision floating-point operands. 

In a single cycle, the Floating-Point Unit can write one of the following to the Register 
File: 

• A double-precision floating-point result, written to a double-word-aligned register 
pair. 

® One or two integer or single-precision floating-point results. 

There is a 64-bit Register File port dedicated to the writing of floating-point results. 
These results can be written without interfering with integer operations. 
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4.4 MEMORY MANAGEMENT UNIT 

The Memory Management Unit (MMU) performs all memory-management functions 
described in Section 3.6. Address translation is performed during the execute stage of 
any load, store or branch instruction that requires address translation. Address trans- 
lation also is performed whenever the processor requires an Instruction that has not 
been prefetched; as discussed in Section 4.2, address translation is performed in this 
case to resolve certain exceptional events that occur during instruction prefetching. 

Though the MMU is shared for instruction and data accesses, the processor pipeline 
is arranged so that there is no contention for the MMU. In general, this is the result of 
the instruction-set definition and the fact that instruction prefetch addresses are gen- 
erated by the Instruction Fetch Pointer (see Section 4.2.1). 

An instruction address is normally translated only when a branch is executed. Since 
neither a load nor a store is executed at the same time, there is no contention for 
the MMU. If the Instruction Fetch Pointer overflows, the address translation is de- 
ferred until the Instruction Fetch Unit determines that the processor requires the asso- 
ciated instruction. Since instruction execution cannot occur at this time, the MMU 
cannot be required for the translation of a load or store address, and again there is no 
contention. 

When the processor performs load-multiple and store-multiple operations, the MMU 
translates the address associated with every access. Load-multiple and store-multiple 
address sequencing is performed in the virtual address space, rather than both the 
virtual and physical address spaces, so that only a single address incrementer is 
required. Since the execution of Load Multiple and Store Multiple instructions is not 
overlapped with the execution of other instructions, there is no penalty associated 
with using the MMU for every access. 

The MMU performs address translation in a single cycle. If an address translation is 
valid, the results of the translation are placed on the Address Bus along with the 
instruction-access or data-access request. In many cases, the address appears on 
the Address Bus during the cycle immediately following address translation (It does 
not appear if the Address Bus is occupied with another access). This address ap- 
pears regardless of the outcome of memory protection checking; this relaxes the 
timing constraints on protection checking, which can be performed only after address 
tran slatio n is complete. If a protection violation is detected, the processor activates 
the BINV signal late in the first address cycle for the request. 

4.5 PIPELINE HOLD MODE 

The Pipeline Hold mode is activated whenever sequential processor operation cannot 
be guaranteed. When this modQ is active, the pipeline stages do not advance, and 
most internal processor state is not modified. The processor places itself in the Pipe- 
line Hold mode in the following situations: 

1 . The processor requires an instruction that has either not been fetched or not been 
returned by the external instruction memory. 

2. The processor requires data from an in-progress load or floating-point operation, 
and the operation has not completed. 

3. The processor attempts to execute a load or store instruction while another load 
or store is in progress. 

4. The processor attempts to execute a floating-point operation and the required 
functional unit is busy. 
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5. The processor must perform a serialization operation as described in Section 3.8. 

6. The processor is performing a sequence of load-multiple or store-multiple 
accesses. The Pipeline Hold mode in this case prevents further instruction 
execution until the completion of the load-multiple or store-multiple sequence. 

7. The processor has taken an interrupt or trap, and the first instruction of the 
interrupt or trap handler has not entered the execute stage. The Pipeline Hold 
mode in this case prevents the processor pipeline from advancing until the 
Interrupt or trap handler can begin execution. 

8. The processor has executed an interrupt return, and the target instruction of the 
interrupt return has not entered the execute stage. The Pipeline Hold mode in this 
case prevents the processor pipeline from advancing until the interrupt return 
sequence is complete. 

The Pipel ine Hol d mode is exited whenever the causing conditions no longer exist, or 
when the WARN or RESET input is asserted. 
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CHAPTER 5 



5.1 



SYSTEM INTERFACES 



The Am29050 microprocessor is pin-compatible with the Am29000 processor. This 
chapter describes the attachment of the Am29050 microprocessor to its hardware 
environment. It describes the channel, which allows the processor to communicate 
with external devices and memories. The Test/Development interface, provided for 
hardware development and testing, is also described. In addition, this chapter in- 
cludes sections on external interrupts, traps, processor reset, clock generation, and 
master/slave checking. 

In the signal descriptions of Section 5.1, certain outputs are described as being 
3-state or bi-directional outputs. However, all outputs (except MSERR) may be placed 
In a high-impedance state by the Test mode. The 3-state and bi-directional terminol- 
ogy in this section is for those outputs (except SYSCLK) that are disabled when the 
processor grants the channel to another master. 



SIGNAL DESCRIPTION 



A(31-0) 



BREQ 



BGRT 



BINV 



R/W 



SUP/US 



LOCK 



Address Bus (3-State Outputs, Synchronous) 

The Address Bus transfers the byte address for all accesses except 
burst-mode accesses. For burst-mode accesses, it transfers the 
address for the first access in the sequence. 

Bus Request (Input, Synchronous) 

This input allows other masters to arbitrate for control of the 
processor channel. 

Bus Grant (Output, Synchronous) 

This output signals to an external master that the process or is 
relinquishing control of the channel in response to BREQ. 

Bus Invalid (Output, Synchronous) 

This output indicates that the Address Bus and related controls are 
invalid. It defines an idle cycle for the channel. 

Read/Write (3-state Output, Synchronous) 

This signal indicates whether data is being transferred from the 
processor to the system, or from the system to the processor. 

Supervisor/User Mode (3-State Output, Synchronous) 

This output indicates the program mode for an access. 

Lock (3-State Output, Synchronous) 

This output allows the implementation of various channel and device 
interlocks. It may be active only for the duration of an access, or 
active for an extended period of time under control of the Lock bit in 
the Current Processor Status. 



The processor does not relinquish the channel (in response to BREQ) 
when LOCK is active. 
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MPGM(1~0) MMU Programmable (3-State Outputs, Synchronous) 

These outputs reflect the value of two PGM bits In the Translation 
Look-Aside Buffer entry associated with the access. If no address 
translation Is performed, these signals are both Low. 

PEN Pipeline Enable (Input, Synchronous) 

This signal allows devices that can support pipelined accesses (I.e., 
that have Input latches for the address and required controls) to 
signal that a second access may begin while the first completes. 

1(31-0) Instruction Bus (Inputs, Synchronous) 

The Instruction Bus transfers Instructions to the processor. 

IREQ Instruction Request (3-State Output, Synchronous) 

This signal requests an Instruction access. When It Is active, the 
address for the access appears on the Address Bus. 

IREOT Instruction Request Type (3-State Output, Synchronous) 

This s ignal specifies the address space of an Instruction request 
when IREQ Is active. 



IREQT 



Meaning 



IRDY 



lERR 



IBREQ 



IBACK 



PIA 



Instruction/random access memory access 
Instruction read-only memory access 



Instruction Ready (Input, Synchronous) 

This Input Indicates that a valid Instruction Is on the Instruction Bus. 
The processor Ignores this signal If there Is no pending Instruction 
access. 

Instruction Error (Input, Synchronous) 

This Input Indicates that an error occurred during the current In- 
struction access. The processor Ignores the content of the Instruction 
Bus, and an Instruction Access Exception trap occurs if the processor 
attempts to execute the invalid Instruction. The processor ignores this 
signal If there Is no pending instruction access. 

Instruction Burst Request (3-State Output, Synchronous) 

This signal Is used to establish a burst-mode Instruction access and 
to reque st Instru ction transfers during a burst-mode Instruction 
access. IBREQ may be active even though the Address Bus is being 
used for a data acce ss. This signal becomes valid late in the cycle, 
with respect to IREQ. 

Instruction Burst Acknowledge (Input, Synchronous) 

This Input is active whenever a burst-mode instruction access has 
been established. It may be active even though no Instructions 
currently are being accessed. 

P ipelin ed Instruction Access (3-State Output, Synchronous) 

If IREQ Is not active, this output Indicates that an instruction access Is 
pipelined with another In-progress Instruction access. The Indicated 

access cannot complete until the first access Is complete. Th_e 

completion of the first access Is signaled by the assertion of IREQ. 
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D(31-0) Data Bus (Bi-directional, Synchronous) 

The Data Bus transfers data to and from the processor, for load and 
store operations. 

DREQ Data Request (3-State Output, Synchronous) 

This signal requests a data access. When it is active, the address for 
the access appears on the Address Bus. 

DREQT(I-O) Data Request Type (3-State Outputs, Synchronous) 

These signals specify the address space of a data access as follows 
(the value x is a don't care). 



DREQT1 



DREQTO 



Meaning 



DRDY 



DERR 



DBREQ 



DBACK 



PDA 



OPT(2-0) 



Instruction/data memory access 
Input/output access 
Coprocessor transfer 



An interrupt/trap vector request is indicated as a data-memory read. If 
required, the system can identify the vector fetch by the STAT(2-0) 
outputs. 

Data Ready (Input, Synchronous) 

For loads, this input indicates that valid data Is on the Data Bus. For 
stores, it indicates that the access is complete, and that data need no 
longer be driven on the Data Bus. The processor ignores this signal if 
there Is no pending data access. 

Data Error (Input, Synchronous) 

This input indicates that an error occurred during the current data 
access. For a load, the processor ignores the content of the Data 
Bus. For a store, the access is terminated. In either case, a Data 
Access Exception trap occurs. The processor ignores this signal if 
there is no pending data access. 

Data Burst Request (3-State Output, Synchronous) 

This signal is used to establish a burst-mode data acces s and to 
request data transfers during a burst-mode data access. DBREQ 
may be active even though the Address Bus is being used for an 
instruction access . This signal becomes valid late in the cycle, with 
respect to DREQ. 

Data Burst Acknowledge (Input, Synchronous) 

This input is active whenever a burst-mode data access has been 
established. It may be active even though no data are currently being 
accessed. 

P ipeline d Data Access (3-State Output, Synchronous) 

If DREQ is not active, this output indicates that a data access is 
pipelined with another in-progress data access. The Indicated access 
cannot complete until the first access is compl ete. Th e completion of 
the first access Is signaled by the assertion of DREQ. 

Option Control (3-State Outputs, Synchronous) 

These outputs reflect the value of bits 18-16 of the load or store 
instruction which begins an access. Bit 18 of the instruction is 
reflected on 0PT2, bit 1 7 on 0PT1 , and bit 1 6 on OPTO. 
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CDA 



WARN 



INTR(3-0) 



TRAP(1-0) 



The standard definitions of tiiese signals (based on DREQT) are as 
follows (the value x is a don't care). 



DREQT1 


DREQTO 


0PT2 


0PT1 


OPTO 


Meaning 





X 











Word-length access 





X 








1 


Byte access 





X 





1 





Half-word access 








1 








Instruction ROM 
access (as data) 








1 





1 


Cache control 








1 


1 





Hardware-develop- 
ment system 
accesses 




- 


-All Others- 


- 




Resen/ed 



During an Interrupt/trap vector fetch, the OPT(2-0) signals Indicate a 
word-length access (000). Also, the system should return an entire, 
aligned word for a read, regardless of the indicated data length. 

The Am29050 microprocessor does not explicitly prevent a store to 
the instruction ROM. 

Coprocessor Data Accept (Input, Synchronous) 

This signal allows the coprocessor to indicate the acceptance of 
operands or operation codes. For transfers to the coprocessor, the 



processor does not expect a DRDY respons e; an active le vel on CDA 
performs the function normally performed by DRDY. CDA may be 
active whenever the coprocessor is able to accept transfers. 

Warn (Input, Asynchronous, Edge-Sensitive) 



A high-to-low transition on this input causes a non-maskable WARN 
trap to occur. This trap bypasses the normal trap vector fetch 
sequence, and is useful in situations where the vector fetch may not 
work (e.g., when data memory is faulty). 

Interrupt Request (Inputs, Asynchronous) 

These inp uts ge nerate prioritized interrupt requests. The interrupt 
cause d by INTRO has the highest priority, and the interrupt caused by 
INTR3 has the lowest priority. The interrupt requests are masked in 
prioritized order by the Interrupt Mask field in the Current Processor 
Status Register. 

Trap Request (Inputs, Asynchronous) 

These inputs generate prioritized trap requests. The trap caused by 
TRAPO has the highest priority. These trap requests are disabled by 
the DA bit of the Current Processor Status Register. 
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STAT(2-0) 



CPU Status (Outputs, Synchronous) 

These outputs indicate the state of the processor's execution stage 
on the previous cycle. They are encoded as folJows: 



STAT2 STAT1 STATO 



Condition 



Halt or step Modes 

1 Pipeline Hold Mode 

10 Load Test Instruction Mode, Synchronize 

1 1 Walt Mode 

1 Interrupt Return 

1 1 Taking Interrupt or Trap 

1 1 Non-Sequential Instruction Fetch 

1 1 1 Executing Mode 



CNTL(1-0) 



RESET 



TEST 



MSERR 



SYSCLK 



INCLK 



CPU Control (Inputs, Asynchronous) 

These Inputs control the processor mode: 



CNTL1 



CNTLO 



Mode 









Load Test Instruction 





1 


Step 


1 





Halt 


1 


1 


Normal 



Reset (Input, Asynchronous) 

This input places the processor in the Reset mode. 

Test Mode (Input, Asynchronous) 

When this input Is active, the processor is in Test mode. All outputs 
and bi-directional lines, except MSERR, are forced to the 
high-impedance state. 

Master/Slave Error (Output, Synchronous) 

This output shows the result of the comparison of processor outputs 
with the signals provided internally to the off-chip drivers. If there is a 
difference for any enabled driver, this line is asserted. 

System Clock (Bi-directional) 

This is either a clock output with a frequency that is half that of 
INCLK, or an input from an external clock generator at the 
processor's operating frequency. 

Input Clock (Input) 

When the processor generates the clock for the system, this is an 
oscillator input to the processor, at twice the processor's operating 
frequency. In systems where the clock is not generated by the 
processor, this signal must be tied High or Low, except in certain 
master/slave configurations as discussed in Section 5.8. 

The following pins are not signal pins, but are named in Am29050 microprocessor 
documentation because of their special role in the processor and system. 

PWRCLK Power Supply for SYSCLK Driver 

This pin is a power supply for the SYSCLK output driver. It isolates 
the SYSCLK driver, and is used to determine whether or not the 
Am29050 microprocessor generates the clock for the system. If 
power (+5 volts) is applied to this pin, the Am29050 microprocessor 
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generates a clock on the SYSCLK output. If this pin is grounded, the 
Am29050 microprocessor accepts a clock generated by the system 
on the SYSCLK input. 

PIN169 Alignment pin 

This pin is used to Indicate proper pin-alignment of the Am29050 
microprocessor. Hardware-development systems can use this pin to 
communicate its presence to the system. 



5.2 CHANNEL DESCRIPTION 

The processor channel provides the bandwidth required for performance, while per- 
mitting the connection of many different types of devices. This section describes the 
channel, and methods of connecting devices and memories to the processor. 

The channel also Is used for transfers to and from the coprocessor. Coprocessor 
transfers are described in Section 6.2. 

Timing diagrams for operations described in this chapter appear in Appendix A. 

5.2.1 Channel Overview 

The channel consists of three 32-bit synchronous buses with associated control and 
status signals: the Address Bus, Data Bus, and Instruction Bus. The Address Bus 
transfers addresses and control Information to devices and memories. The Data Bus 
transfers data to and from devices and memories. The Instruction Bus transfers in- 
structions to the processor from instruction memories. In addition, a set of signals 
allow control of the channel to be relinquished to an external master. 

There are five logical groups of signals performing five distinct functions, as follows 
(since some signals perform more than one function, a signal may appear in more 
than one group): 

1 . Instruction A ddress Tran sfer and Instruction Access Requests: A(31-0), SUP/DS, 
MPGM{1-0), PEN, IREQ, IREQT, PIA, BINV. 



2. Instruction Transfer: 1(31-0), IBREQ, IRDY, lERR, IBACK. 

3. Data A ddress Transf er and Data Access Requests: A(31-0 ), R/W, SUP /US, 
LOCK, MPGM(1-0), PEN, DREQ, DREQT(I-O), OPT(2-0), PDA, BINV. 



4. Data Transfer: D(31-0), DBREQ, DRDY, DERR, DBACK, CDA. 



5. Arbitration: BREQ, BGRT, BINV. 



5.2.2 User-Defined Signals 

There are two types of user-defined outputs on the processor to control devices and 
memories directly in a system-dependent manner. Each of these outputs is valid 
simultaneously with — and for the same duration as — the address for an access. 

The first set of user-defined signals, MPGM(1-0), is determined by the PGM bits in 
the Translation Look-Aside Buffer entry used in address translation. If address trans- 
lation is not performed, these outputs are both Low. 

The second set of signals, GPT(2-0), are determined by bits 1 8-1 6 of the load or 
store Instruction that initiates an access. These signals are valid only for data ac- 
cesses, and have a pre-defined interpretation for coprocessor data transfers. 
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standard interpretations of OPT(2-0) are given in Section 5.1. Since tiie OPT(2-0) 
signals are determined by instructions, they have an inipact on application-software 
compatibility, and system hardware should use the given definitions of OPT{2-0). The 
0PT{2-^) signals are used to encode byte and half-word accesses. However, for a 
load, the system should return an entire, aligned word, regardless of the indicated 
data width. 

Note that the standard Interpretations of OPT(2-0) apply only to accesses to instruc- 
tion/data memory and Input/output. Other interpretations may be used for coprocessor 
transfers. 

For interrupt and trap vector fetches, the MPGM(I-O) and OPT(2-0) outputs are 
all Low. 



5.2.3 Instruction Accesses 



Instruction accesses occur to one of two address spaces: instruction/data memory 
and instruction read-only memory (instruction ROM). The distinction between these 
address spaces is made by the IREQT signal, which is in turn derived from the ROM 
Enable (RE) bit of the Current Processor Status Register. These are truly distinct 
address spaces; each may be populated independently based on the needs of a 
particular system. 

Instruction/data memory contains both instructions and data. Although the channel 
supports separate instruction and data memories, the Memory Management Unit 
does not. In certain systems, it may be required to access Instructions via loads and 
stores, even though instructions may be contained in physically separate memories. 
For example, this requirement might be imposed because of the need to load instruc- 
tions Into memory. Note also that the OPT(2~0) signals may be used to allow the 
access of instructions in instruction ROM, using loads; the Am29050 microprocessor 
does not prevent a store to the instruction ROM, and protection against stores to the 
Instruction ROM must be provided externally, if required. 

All processor instruction fetches are read accesses, and the R/W signal Is High for all 
instruction fetches. 



5.2.4 Data Accesses 



Data accesses occur to one of three address spaces: Instruction/data memory, input/ 
output (I/O), and the coprocessor. The distinction between these spaces Is made by 
the DREQT(1~0) signals, which are In turn determined by the load or store instruction 
which initiates a data access. Each of these address spaces is distinct from the 
others. 

The protocol for data transfers to and from the coprocessor is slightly different than 
the protocol for instruction/data memory and I/O accesses. These transfers are de- 
scribed in Section 6.2. 

Data accesses may occur either from a slave device or memory to the processor (for 
a load), or from the processor to a_slave device or memory (for a store). The direction 
of transfer is determined by the R/W signal. In the case of a load, the processor re- 
quires that data on the Data Bus be held valid only for a short time before the end of a 
cycle. In the case of a store, the processor drives the Data Bus as soon as the bus Is 
available and holds the data valid until the slave device or memory signals that the 
access is complete. 
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5.2.5 Reporting Errors 



The succ essful completion of an instruction access is indicated by an active level on 
the IRDY input, an d the s uccessful completion of a data access is indicated by an 
active level on the DRDY input. If there are exceptional conditions for which an in- 
struction or data access cannot co mplet e s uccess fully, the unsuccessful completion is 
indicated by an active level on the lERR or DERR input, as appropriate. 



If the processor receives an lERR or DERR in response to an instruction o r data ac- 
cess, i t ign ores t he content of the Instruction or Data Bus and the value of IRDY or 
DRDY. An lERR response causes an Instruction Access Exception trap or a Monitor 
trap, unless it is associated with an instruction that the processor does not ultimately 
execute (because of a non-sequential Instruction fetch). A DERR response always 
causes either a Data Access Exception trap, a Coprocessor Exception trap, or a 
Monitor trap. 

The processor supports the restarting of unsuccessful accesses upon an interrupt 
return. In the case of an unsuccessful instruction access, the restart is performed by 
the Program Counter and Program Counter 1 registers. In the case of an unsuc- 
cessful data access, the restart is performed by the Channel Address, Channel Data, 
and Channel Control registers. In any event, the control program must determine 
whether or not an access can and/or should be restarted. 

The Instruction Access Exception and Data Access Exception traps cannot be 
masked. If one of these traps occurs within an Interrupt or trap handler, a Monitor trap 
occurs. 



5.2.6 Access Protocols 



Figure 5-1 shows a control flowchart for accesses performed by the Am29050 micro- 
processor. This control flow applies independently to both instruction and data ac- 
cesses. Since the processor performs concurrent instruction and data accesses, 
these accesses may be at different points in the control flow at any given point in time. 

Note that the items on the flowchart of Figure 5-1 do not represent actual states, and 
have no particular relationship to processor cycles. The flowchart provides only a 
high-level understanding of the control flow. Also, exceptions and error conditions are 
not shown. 

The channel supports three protocols for accesses: simple, pipelined, and burst- 
mode. These are described in the following sections. The various protocols are de- 
fined to accommodate minimum-latency accesses as well as maximum-transfer-rate 
accesses. The protocols allow an access to complete in a single cycle, although they 
support accesses requiring arbitrary numbers of cycles. Address transfers for ac- 
cesses may be independent of instruction or data transfers. 

5.2.7 Simple Accesses 

For a simple access, the processor holds the address valid throughout the entire 
access. This protocol Is used for single-cycle accesses, and for accesses to simple 
devices and memories. 

On any cycle before the completion of th e acce ss, a simple access may be converted 
to a pipeline d acces s (by the a ssertion of PEN) or to a burst -mode ac cess (by the 
assertion of I BACK or DBACK, if the processor is asserting IBREQ or DBREQ). Thus, 
the protocol for simple accesses also may be used during the initial cycles of pipe- 
lined and/or burst-mode accesses. This is advantageous, for example, in cases where 
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Figure 5-1 
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the slave device or memory either requires the address to be held for multiple cycles 
at the beginning of the pipelined or burst-mode access, or cannot respond to the 
pipelined or burst-mode request within one cycle. 



5.2.8 Pipelined Accesses 



A pipelined access Is one that starts before an earlier In-progress access completes. 
The In-progress access Is called a primary access, and the second access is called a 
pipelined access. A pipelined access Is of the same type as the primary access. For 
example, an Instruction access that begins before the completion of a data access is 
not considered to be a pipelined access, whereas a second data access is. 

The Am29050 microprocessor allows only one pipelined access at any given time, 
and does not perform pipelined accesses for the Load Multiple and Store Multiple 
Instructions. 

5.2.8.1 TRADEOFFS 

For accesses that require more than one cycle to complete, pipelined accesses per- 
form better than simple accesses, because they allow the overlap of portions of two 
accesses. In addition, the ability to latch addresses in support of pipelined accesses 
reduces utilization of the Address Bus, thereby reducing contention between instruc- 
tion and data accesses. However, devices and memories that support pipelined ac- 
cesses are somewhat more complex than devices and memories that support only 
simple accesses. 

Support for pipelined operations Is required for both the primary access and the 
pipelined access. The slave performing the primary access must contain some means 
for storing the address and other information about the access. The slave performing 
the pipelined access must be able to restrict its use of the Instruction Bus or Data 
Bus, and must be prepared to cancel the access (as explained below). 

5.2.8.2 PIPELINED OPERATION 

Pipelined accesses are controlled by the signals PEN, PJA, and PDA. Because of 
internal data-flow constraints, the Am29050 microprocessor does not perform a 
pipelined store operation while a load is in progress. However, the protocol does not 
restrict pipelined operations. Other channel masters may perform a pipelined store 
during a load. 

Except as note d abo ve, the processor attempts to perform pipelining for every ac- 
cess; the i nput PEN indicates whether or not pipelining is supported for a given ac- 
cess. The PEN input can be driven by individual devices, or can be tied active or 
inactive to enable or disable system-wide pipelined accesses. The processor Ignores 
the value of PEN unless it is performing an access. 

The processor samples PEN on every cycle during a primary access. If PEN is active 
on any cycle, the processor may cease to drive the address and associated controls 
for the primary access in the next cycle. Following this, PEN must remain active. If the 
processor requires another access before the primary access com pletes. It drives the 
address and controls for the second access assertin*^ P!A» or PDA to indicate that the 
second access is a pipelined access. 



The output IREQ or DREQ, as appropriate, Is not asserted for a pipelined access. 
Devices and mem ories that cannot support pipelined accesses sho uld the refore 
ignore PIA and/or PDA, and base their operation upon IREQ and/or DREQ. 

A device or memory that receives a request for a pipelined access may treat it as any 
other access, with one exception: the pipelined access cannot use the Instruction and 
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Data buses nor the associated controls (e.g., IRDY or DRDY). In the case of a data 
read or instruction access, the results of the pipelined access cannot be driven on the 
appropriate bus. In the case of a data write, the data does not appear on the Data 
Bus. Any other operations for the access, such as address decoding, can occur. 



When the primary access completes (as indicated by IRDY or DRDY), the pip elined 
access becomes a primary access. The processor indicates this by asserting IREQ or 
DREQ, depending on the type of access. The device o r mem or y perfor ming the 
pipelined access may complete the access as soon as IREQ or DREQ is asserted 
(possibly in the same cycle). When the access becomes a primary access, it controls 
the channel as any other primary access. For example, it may determine whether or 
not another pipelined access can be performed. 

When the pipelined access becomes a primary access, the output PIA or PDA re- 
mains asserted for one cy cle, to in sure co ntinuity of control within the slave device or 
memory. In the cycle after IREQ or DREQ is asserted, PIA or PDA is de-asse rted, 
unless the processor initiates another pipelined access. In which case PIA or PDA 
remains asserted for the new access. 

5.2.8.3 CANCELLATION OF PIPELINED ACCESSES 

If the processor takes an interrupt or trap before a pipelined access becomes a pri- 
mary access, the request for t he pip eli ned acc ess is removed from the channel. This 
may occur, for example, when lERR or DERR is signaled for the primary access. 

If the pipelined acce ss is remo ved from the channel, the slave device or memory does 
not receive an IREQ or DREQ for the pipelined access. Hence, the pipelined access 
does not become a primary access, and cannot complete. A pipelined access may be 
canceled In this manner at any time before it becomes a primary access. Because of 
this, a pipelined access should not change the state of a slave device or memory until 
the pipelined access becomes a primary access. 

3.2.9 Burst-Mode Accesses 

A burst-mode access allows multiple instructions or data words at sequential ad- 
dresses to be accessed with a single address transfer. The number of accesses 
performed, and the timing of each access within the sequence, is controlled dynami- 
cally by the burst-mode protocol. Burst-mode accesses take advantage of sequen- 
tial addressing patterns, and provide several benefits over simple and pipelined 
accesses: 

1 . Simultaneous instruction and data accesses. Burst-mode accesses reduce the 
utilization of the Address Bus. This is especially important for instruction 
accesses, which are normally sequential. Burst-mode instruction accesses 
eliminate most of the address transfers for instructions, allowing the Address Bus 
to be used for simultaneous data accesses. 

2. Faster access times. By eliminating the address-transfer cycle, burst-mode 
accesses allow addresses to be generated in a manner which improves access 
times. 

3. Faster memory access modes. Many memories have special high-bandwidth 
access modes (e.g., static-column page mode and nibble mode). These modes 
generally require a sequential addressing pattern, even though addresses may 
not be presented explicitly to the memory for all accesses. Burst-mode accesses 
allow the use of these access modes, without hardware to detect sequential 
addressing patterns. 



5.2.9.1 BURST-MODE OVERVIEW 

The control-flow diagrams in Figure 5-2 and Figure 5-3 Illustrate the operation of the 
processor and an Instruction memory during a burst-mode Instruction access. The 
control-flow diagrams In Figure 5-4 and Figure 5-5 Illustrate the operation of the pro- 
cessor and a data memory or device during a burst-mode data access. These dia- 
grams are for Illustration only; nodes on these diagrams do not necessarily corre- 
spond to processor or slave states, and transitions on these diagrams do not neces- 
sarily correspond to processor cycles. 

A burst-mode access is In one of the following operational conditions at any given 
time. 

Established— The processor and slave device have successfully Initiated the 
burst-mode access. A burst-mode access that has been established is either 
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Figure 5-3 Slave Burst-Mode Instruction Accesses: Control Flow 
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Canceled by 
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Note: A similar state transition may be used to support suspended burst-mode data accesses 
or a channel master other than the processor. 



active or suspended. An established burst-mode access may become preempted, 
terminated, or canceled. 

Active — Instruction or data accesses and transfers are being performed as the result 
of the burst-mode access. An active burst-mode access may become suspended. 

Suspended — No accesses or transfers are being performed as the result of the 
burst-mode access, but the burst-mode access remains established. Additional ac- 
cesses and transfers may occur at some later time (I.e., the burst-mode access may 
become active) without the re-transmission of the address for the access. 

Preempted— The burst-mode access can no longer continue because of some condi- 
tion, but the burst-mode access can be re-established within a short amount of time. 

Terminated— All required accesses have been performed. 
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Figure 5-4 



Processor Burst-Mode Data Accesses: Control Flow 



Start 



(DBREQ, DBACK Active) 



DERR Active, 
or Interrupt/Trap Taken 
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retransmit address 



TLB miss or 
protection violation 



Note: The Am29050 microprocessor does not suspend burst-mode data accesses. 



Canceled — The burst-mode access can no longer continue because of some excep- 
tional condition. The access may be re-established only after the exceptional condi- 
tion has been corrected, if possible. 

Each of the preceding conditions, except for the terminated condition, is under the 
control of both the processor and slave device or memory. The terminated condition is 
determined by the processor, since only the processor can determine that all required 
accesses have been performed. The following sections discuss each of the above 
conditions with respect to the burst-mode protocol. 
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Figure 5-5 Slave Burst-Mode Data Accesses: Control Flow 
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5.2.9.2 ESTABLISHING BURST-MODE ACCESSES 

The Am29050 microprocessor attempts to perform ail instruction prefetches using 
burst-mode accesses, except for instruction fetches at the last word before a 1-kb 
address boundary. For data accesses, the processor attempts to perform load- 
multiple and store-multiple operations using burst-mode accesses. The proc essor 
indicates that It desires a burst-mode access by asserting IBREQ or DBREQ during 
the cycle in which the initial address is placed on the Address Bus (however, note that 
these signals become valid later in the cycle than the address). 



The inputs IBACK and DBACK indicate tha t a requ ested burst-m ode access is sup- 
ported. The process or ignore s the v alue of I BACK unless IBREQ is asserted, and it 
ignores the value of DBACK unless DBREQ is asserted. 



When it desires a burst-mode access, the processor continues to drive IBREQ or 
DBREQ on every cycle for which the address is valid on the A ddress Bu s. During this 
time, the device or memory involved in the access m ay asse rt I BACK or DBACK to 
indicate that it can perform the burst-mode access. If IBACK or DBACK (as appropri- 
ate) is asserted while the initial address appears on the Address Bus, the burst-mode 
access is established. In the following cycle, the processor removes the request 
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address and de-asserts IREQ or DREQ. However, it continues to assert iBREQ or 
DBREQ. 

If tlie burst-mode access is not established on the first access, the processor attempts 
to establish a burst-mode access on each subsequent address transfer, as long as 
there are more accesses yet to be performed. During any subsequent access, the 
addressed device or memory may establish a burst-mode access by asserting IBACK 
or DBACK. If the burst-mode access is never established, the default behavior is to 
have the processor transmit an address for every access. 

5.2.9.3 ACTIVE AND SUSPENDED BURST-MODE ACCESSES 



After the burst-mode access Is established, IBREQ and DBREQ are used during 
subseq uent ac ce sses to in dicate that the processor requires at least one more ac- 
cess. If IBREQ or DBREQ is ac tive at th e end o f the cycle in which an access suc- 
cessfully completes (i.e., when IRDYor DRDY is active), the processor requires an- 
other access. If the slave device or memory previousl y has no t preempte d the burst- 
mode access , and do es not preempt (by de-asserting IBACK or DBACK) or cancel 
(by asserting lERR or DERR) the burst-mode access in the cycle that the access 
completes, the additional access must be performed. 

The execution rate of instructions is known only dynamically, s o that in certain situ- 
ations, a burst-mode instruction access must be suspended. If IBREQ is inactive 
during the cycle in which an instruction access completes, the burst-mode access Is 
suspended (if it is neither preempted nor canceled at the same time). The burst-mode 
access remain s susp ended unless the processor requests a new instruction access 
(in which case IREQ is asserted), or unless the instruction memory preempts the 
burst-mode access. 

A suspended burst-mode instruction access becomes active whenever the processor 
can acce pt more instructions. The processor activates the burst-mode access by 
asserting IBREQ. If the instruction memory does not preempt the burst-mode access 
during this cycle, an instruction access must be performed. 

When a suspended burst-mode instruction access is activate d, the r esulting instruc- 
tion access is not permitted to complete in the cycle in which IBREQ is asserted, but 
may complete in the next cycle. The reason for this restriction is t hat the b urst -mode 
protocol is defined such that the combination of an active level on IBREQ and IRDY 
causes an instruction access (as previously discussed). If the instruction access 
completes immediately in the cycle that a suspended burst-mode access is activated, 
th ere is a n ambiguity in the protocol: It is possible to interpret a single-cycle assertion 
of IBREQ as a request for two instructions. 

The above ambiguity is resolved by delaying the instruction access resulting from a 
re-activated burst-mode access for a cycle. Since this restriction applies only when 
the Instruction Prefetch Buffer is full and the instruction memory is capable of a very 
fast access, the delayed instruction response has no performance impact. 

The Am29050 microprocessor does not suspend burst-mode data accesses, because 
the data transfers occur to and from general-purpose registers, which are always 
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(during direct memory accesses, for example). The principles for suspending burst- 
mode accesses are the same as those for instruction accesses discussed above. 

S.2.9.4 PROCESSOR PREEMPTION, TERMINATION, AND CANCELLATION 

Th e proces so r may pr eempt, termin ate or ca ncel a b urst-mode access by de-assert- 
ing IBREQ or DBREQ, and asserting IREQ or DREQ at some later point. During the 
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period after IBREQ or DBREQ is de-asserted and before IREQ or DREQ is asserted, 
the burst-mode access is in a suspend ed cond itio n. Norm ally, the processor receives 
one more instruction or data word after IBREQ or DBREQ i s de-ass erted. However, 
this access may complete in the same cycle t hat IBRE Q or DBRE Q is de-asserted. 
Please no te that th e processor may de-assert IBREQ or DBREQ without receiving an 
IBACK or DBACK to acknowledge the burst-mode access. 

The slave device or memory cannot distinguish between preempted, terminated, and 
canceled burst-m ode ac ce sses, w hen these are caused by the pr ocesso r, u ntil the 
proc essor as se rts IREQ or DREQ. If the slave continues to assert IBACK or DBACK 
after IBREQ or DBREQ is de-asse rted, t he slave should be prepared to accept any 
new request during the cycle that IREQ or DREQ is asserted to begin the new access. 
The reason for this is that the processor may att empt to es tablish a burst-mode ac- 
cess for the new access: if the slave is asserting IBACK or DBACK because of a 
previously preempte d, term ina ted or ca nceled burst-mode access, the processor 
Interprets the active IBACK or DBACK as establishing the new burst-mode access 
and removes the request in the following cycle. 

The processor preempts a burst-mode access when an external channel master 
arbitrates for the channel, or when a burst-mode fetch crosses a potential virtual-page 
boundary. Since the minimum page size Is 1 kb, burst-mode instruction and data 
accesses are preempted whenever the address sequence crosses a 1 -kb address 
boundary. The burst is re-established as soon as a new address translation is per- 
formed (if required). A new physical address is transmitted when the burst-mode 
access is re-established. 

Note that the preemption resulting from page boundaries is advantageous for devices 
or memories that require counters to follow the burst-mode address sequence. Since 
all burst-mode accesses are word accesses, and the processor re-transmits an ad- 
dress at every 1-kb address boundary, an 8-bit counter in the slave device or memory 
is sufficient to follow the burst-mode address sequence. Additional address bits are 
simply latched. 

The processor terminates a burst-mode access whenever all required Instructions or 
data have been accessed. In the case of instruction accesses, the burst-mode access 
is terminated when a non-sequential fetch occurs. In the case of data accesses, the 
burst-mode access is terminated when the count indicates a single load or store 
remains. The last load or store is executed as a simple access. 

The processor cancels a burst-mode access when an interrupt or trap is taken. Note 
that a trap may be caused by the burst-mode access, for example when a Translation 
Look-Aside Buffer miss occurs on an address in the burst-mode sequence. If the 
processor cancels a burst-mode access when an access in the sequence remains to 
be complete, this access must be completed in spite of the cancellation. 

Canceled burst-mode data accesses may be restarted at some (possibly much later) 
point in execution via the Channel Address, Channel Data, and Channel Control 
registers. In this case, the burst-mode access is restarted at the point at which it was 
canceled, rather than at the beginning of the original address sequence. 

5.2.9.5 SLAVE PREEMPTION AND CANCELLATION 

The slave device or memory in volved in a burst-mode access m ay pree mpt the ac- 
cess by de- ass erting IB ACK or DBACK. The pro cess or samp les IBACK and DBACK 
when IRDY and DRDY are active, so that IBACK a nd DBA CK may be d e-asserted as 
the last supported access is completed. However, IBACK and DBACK also maybe 



de-asserted in any cycle before the access completes; to preempt the access, IBACK 
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or DBACK must remain inactive until IRDY or DRDY is asserted. If IBACK or DBACK 
is de-asserted when the processor is in a state where it expects an access, the ac- 
cess must be completed. 

In general, the slave device or memory preempts the burst-mode access whenever it 
cannot support any further accesses in the burst-mode sequence. This normally 
occurs whenever an implementation-dependent address boundary is encountered 
(e.g., a cache-block boundary), but may occur for any reason. By preempting the 
burst-mode access, the slave receives a new request, with the address of the next 
instruction or data word required by the processor. 



The sl ave device or memory may cancel a burst-mode access b y asserti ng I ERR or 
DERR in response to a requested access. The signals IBACK or DBACK need not be 
de-asserted at this time, but should be de-asserted in the next cycle. 



Note t hat the lERR and DERR signals cause non-maskable traps, except in the case 
where lERR is asserted for an instruction which the processor does not execute. 



5.2.10 Arbitration 



External mas ters ca n gain access to the Address, Data, and Instruction buses by 
asserting the BREQ input. The processo r comp letes any pending access, preempts 
any burst-mode access, and asserts the BGRT output. At this time, the processor 
places all channel outputs associated with the Address, Data, and Instruction buses in 
the high-impedance state. 



For the first cycle that BGRT is asserted, the output BINV Is also asserted. If the exter- 
nal ma ster cannot control the Addres s Bus and associated controls in the cycle that 
BGRT is asserted, the active level on BINV may be used to d efine a n idle cycle for the 
channel (i.e. any spurious access requests are ignored). The BINV signal is asserted 
only for a single c ycle, so the external master must take control of the channel in the 
cycle after BGRT is asserted. 



While the BREQ input remains asserted, the processor continues to assert BGRT. The 
external master has control over the channel during this time. 



To release the channel to the processor, the external master d e-asse rts BREQ, but 
must continue to control the channel for the first cycle in which BREQ is de-asserted. 
In the cycle after BREQ is de-asserted, the processor asserts BINV and de-asserts 
BGRT; the external master should releas e cont rol of the channel at this time. On the 
following cycle, the processor de-asserts BINV, and is able to use the channel. The 
processor re-establishes any burst-mode access preempted by arbitration. 



The processor does not relinquish the channel when the LOCK signal is active. This 
prevents external masters from interfering with exclusive accesses. 



5.2.1 1 Use of BINV to Cancel an Access 



Besides using the BINV signal to transfer control of the channel from one master to 
another, the Am23050 microprocessor uses the BINV signal to cancel accesses after 
they h ave b ee n initiat ed. To cancel an access, BINV is asserted during a cycle in 
which IREQ or D REQ also is asserted. If an a ccess is canceled, the accompa nying 
response (using IRDY, lERR, DRDY or DERR) is ignored during the cycle that BINV is 
asserted; thereafter, the system should not respond to the canceled access. 
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The BINV signal is used to cancel an instruction access in the following situations: 

• When an interrupt or trap is taken; 

• When an instruction fetch-ahead is canceled because a target block is only partially 
present in the Branch Target Cache mennory; 

• When an instruction TLB miss or protection violation occurs on an instruction 
access; 

• When a branch instruction is the delay instruction of another branch, and the targets 
of both branches are in the Branch Target Cache memory (in this case, the external 
fetch for the target of the first branch is not required); and 

• When the processor enters the Load Test Instruction Mode, and there is an active 
instruction request on the channel. 



The BINV signal is used to cancel a data access in the following situations: 

• When a data TLB miss or protection violation occurs on the data access; and 

• When an interrupt or trap is taken in the cycle that a data access appears on the 
channel. 

When a LOADSET instruction encounters a protection violat ion be cause store access 
is not permitted, the processor cancels the load access with BINV. 

5.2.12 Bus Sharing — Electrical Considerations 

When buses are shared among multiple masters and slaves, it is important to avoid 
situations where these devices are driving a bus at the same time. This may occur 
when more than one master or slave is allowed to drive a bus in the same cycle, if 
bus arbitration is incompletely or incorrectly performed. However, it also occurs when 
a master or slave releases a bus in the same cycle that another master or slave gains 
control, and the first master or slave is slow in disabling its bus drivers, compared to 
the point at which the second master or slave begins to drive the bus. The latter situ- 
ation is called a bus collision in the following discussion. 

In addition to the logical errors that can occur when multiple devices drive a bus 
simultaneously, such situations may cause bus drivers to carry large amounts of 
electrical current. This can have a significant impact on driver reliability and power 
dissipation. Since bus collisions usually occur for a small amount of time, they are 
of less concern, but may contribute to high-frequency electromagnetic emissions. 

The Am29050 microprocessor channel is defined to prevent all situations where 
multiple drivers are driving a bus simultaneously. However, bus collisions may be 
allowed to occur, depending on the system design. 

In the case of the Am29050 microprocessor channel, arbitration for the channel pre- 
vents the processor from driving the Address and Data buses at the same time as 
another channel master. If there is more than one external master, the system design 
must include some means for insuring that only one external master gains control of 
the channel, and that no external master gains control of the channel at the same 
time as the processor. 

When the processor relinquishes control of the channel to an external master, bus 
collisi ons m ay be prevented by not allowing the external master to drive any bus 
while BINV is active. This insures that all processor outputs are disabled by the time 

the external master takes control of the channel. However, there is nothing in the 

channel protocol to prevent the external master from taking control as soon as BGRT 
is asserted. 
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Slave devices and memories are prevented from simultaneously driving the Instruc- 
tion Bus or Data Bus by allowing only the device or memory performing a primary 
access to drive the appropriate bus. When a pipelined access becomes a primary 
access, it may drive the Instruction or Data Bus immediately, so that there Is a poten- 
tial bus collision if the pipelined access is performed by a slave other than the slave 
performing the original primary access. This bus collision may be prevented by re- 
stricting all slaves to driving the Instruction and Data buses in the second half-cycle 
(using SYSCLK, for example). Since the processor samples data only at the end of a 
cycle, this restriction does not affect performance. 

When the processor performs a store immediately following a load, it drives the Data 
Bus and asserts DREQ for the store in the second cycle following the cycle in which 
the data for the load appears on the Data Bus. This provides a complete cycle for the 
slave involved in the load to di sable it s d ata dri vers. The processor continues to drive 
the Data Bus until It receives a DRDY or DERR in response to the store; it ceases to 
drive the Data Bus in the cycle following the response. 

5.2.13 Channel Behavior for Interrupts and Traps 

If an interrupt or trap is taken, any burst-mode accesses are canceled. If a request 
for a pipelined access is on the Address Bus, this request is removed. Any other 
accesses are completed, and no new accesses are started, other than those 
required for the interrupt or trap. Note that any accesses that the processor expects 
to complete must be completed, even though burst-mode and pipelined accesses 
are canceled. 

When interrupt or trap processing is complete, any canceled burst-mode accesses 
transactions are re-established, using the address of the access that was to be per- 
formed next when the interrupt or trap was taken. Uncompleted pipelined accesses 
are restarted, either by the interrupt return sequence in the case of an instruction 
access, or by restarting the Initiating instruction in the case of a data access. 

Note that the restarting of a pipelined access Is not performed by the Channel Ad- 
dress, Channel Data, and Channel Control registers, since these registers maybe 
required to restart the primary access. The instruction initiating the pipelined access 
is not allowed to complete until the primary access completes, so that the Program 
Counter 1 (PC1) Register contains the address of the Initiating instruction when a 
pipelined access is canceled. The address in PC1 can restart this instruction on Inter- 
rupt return. 



5.2.14 Effect of the LOCK Output 



The LOCK output p rovides synchronization and exclusion of accesses in a multi-proc- 
essor environment. LOCK has no pre-defined effect for a system, other than the fact 
that t he Am2 9050 microprocessor does not grant the channel to an external master 
while LOCK is active. 



The LOCK output is asserted for the address cycle of the Load-and-Lock and Store- 
and-Lock instructions, and is asserted for both the read and write accesses of a Load 
and Set instruction. LOCK may also be active for an extended period of time, under 
control of the Lock bit in the Current Processor Status Register (this capability is 
available only to Supervisor-mode programs). 



LOCK may be defined to provide any level of resource locking for a particular system. 
For example, it may lock the channel, an Individual device or memory or a location 
within a device or memory. 
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When a resource is locked, it is available for access only by the processor with the 
appropriate access privilege. The mechanisms for restricting accesses, and the meth- 
ods for reporting attempted violations of the restrictions, are system-dependent. 

5.3 TEST/DEVELOPMENT INTERFACE 



The Test/Development Interface consists of the Inputs CNTL(1-0) and TEST, and the 
outputs STAT(2-0). The CNTL{1-0) inputs provide control of processor operation, 
and the STAT(2-0) outputs provide information about processor operation for external 
monitoring. 

A hardware-development system uses CNTL{1-0) and STAT(2-0) to control the 
processor for the purposes of processor and system debug. 



A hardware tester uses the TEST input to place all processor outputs in the high-Im- 
pedance state. This allows the tester to check other system logic by driving processor 
outputs directly, without requiring that the processor be removed from the system. 

5.3.1 Processor Status Outputs 

The STAT(2-0) outputs Indicate certain information about processor modes, along 
with other information about processor operation. STAT(2-0) may be used to provide 
feedback of processor behavior during normal processor operation and when the 
processor is under the control of a hardware-development system. 

The encoding of STAT{2-0) is as follows: 



STAT2 


STAT1 


STATO 


Mode or Condition 











Halt or Step Modes 








1 


Pipeline Hold Mode 





1 





Load Test Instruction Mode, Synchronize 





1 


1 


Wait Mode 


1 








Interrupt Return 


1 





1 


Taking Interrupt or Trap 


1 


1 





Non-Sequential Instruction Fetch 


1 


1 


1 


Executing Mode 



On any given cycle, the STAT{2-0) signals reflect the state of the processor's execute 
stage on the previous cycle. Where the conditions listed above are not mutually exclu- 
sive, the condition listed first is the one reflected on STAT(2-0). 

The first cycle of a multi-cycle instruction (Load Multiple, Store Multiple, Interrupt 
Return, or Interrupt Return and Invalidate) is indicated as an "Executing Mode" cycle. 
When an interrupt or trap is taken, the first cycle is indicated as a "Taking Interrupt or 
Trap" cycle. Additional cycles of these multi-cycle operations are indicated as "Pipe- 
line Hold" cycles. 

A Low level on STAT2 Indicates that the processor is idle, and may be used as an 
indication of processor performance. Since most processor instructions execute in a 
single cycle, and since extra cycles spent executing multiple-cycle operations are 
counted as Pipeline Hold cycles, a count of the number of cycles within a given time 
interval that the processor is not Idle (i.e., a count of the number of cycles for which 
STAT2 is High) is a close approximation to the number of instructions executed within 
that interval, and thus approximates the instruction-execution rate. The only source of 
error in this approximation are the cycles in which the processor takes an interrupt or 
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trap. If desired, this source of error can be eliminated by fully decoding the STAT{2-0) 
outputs. 

The STAT2 output also may be used to Implement processor timeouts for reliability. 
For example, a Low level on STAT2 may be used to start a hardware timeout counter, 
with a High level resetting and stopping the counter. If the counter exceeds a maxi- 
mum expected count of Idle cycle s for a system, It Is likely that an error has occurred. 
This error can be reported by the WARN trap (see Section 3,5.6 and Section 5.6). 

The value 010 on the STAT(2-0) outputs Is used by the hardware breakpoints for 
synchronization of external hardware. If this value appears during normal processor 
operation for one cycle, a valid breakpoint comparison has been detected with the 
BSY bit being 0. The processor takes no other actions related to the breakpoint. The 
synchronization pulse can be used to trigger or synchronize external logic. 



S.3.2 CPU Control Inputs 



Certain processor operational modes are under the control of the CNTL{1--0) inputs. 
These Inputs have an effect on the processor mode as follows: 



CNTL1 CNTLO Mode 









Load Test Instruction 





1 


Step 


1 





Halt 


1 


1 


Normal 



These Inputs are asynchronous to the processor clock. In addition, changes on the 
CNTL(1-0) Inputs are restricted so that only CNTL1 or CNTLO, but not both, may 
change in any given processor cycle. The allowed transitions are shown In Figure 5-6. 
The restriction on CNTL(1-0) transitions allows these Inputs to be driven directly by 
an external hardware-development system or tester, without any intervening logic. 
Proper operation Is Insured by making only single-Input changes on CNTL(1-0), and 
by restricting the Interval between all changes to be greater than a processor cycle. If 
these restrictions are violated, processor operation is unpredictable, and a processor 
reset Is required to resume predictable operation. 

Note that, because of the restriction described above, it is not possible to transition 
directly between all possible modes that are controlled by these inputs. For example, 
the processor cannot go from the Load Test Instruction mode to Normal operation 
without first entering the Halt or Step modes. 

5.3»3 Hardware Development 

The Halt, Step, and Load Test Instruction modes of operation are defined to support 
the debug of the processor system (both hardware and software) by a hardware- 
development system. This section describes the use of these modes during debug, 
and describes the corresponding activity on the CNTL(1-0) and STAT(2-0) lines. 

5.3.3.1 HALT MODE 

The Halt mode allows the hardware-development system to stop processor operation 
while preserving its Internal state. The Halt mode is defined so that normal operation 
may resume from the point at which the processor enters the Halt mode. All external 
accesses are completed before the Halt mode Is entered, so a minimum amount of 
system logic Is required to support the Halt mode. 
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Figure 5-6 Valid Transitions on CNTL(I-O) Inputs 




The Halt mode can be Invoked by applying a value of 10 to the CNTL(1-0) inputs. 
The processor enters the Halt mode within two or three cycles after the CNTL(1-0) 
Inputs are changed (depending on synchronization time), except that it first completes 
any external data access in progress. 

The Halt mode can also be entered as the result of executing a HALT instruction or 
encountering a hardware breakpoint with a hardware-development system attached 
(see below). When a HALT instruction is executed or a breakpoint is encountered, the 
processor enters the Halt mode on the next cycle, except that it completes any exter- 
nal data accesses in progress. In this case, the processor remains in the Halt mode 
even though the CNTL(1-0) inputs are 1 1 . Howeve r, the pr ocessor cannot exit the 
Halt mode except as the result of the CNTL(1-0) or RESET inputs. If the instruction 
following a Halt instruction has an exception (e.g., instruction TLB Miss), the trap 
associated with the exception is taken before the processor enters the Halt mode. 

The Halt instruction is designed to be used as an instruction breakpoint by the hard- 
ware-development system, augmenting the hardware breakpoints provided by the 
Am29050 microprocessor. However, the Halt instruction normally is a privileged 
Instruction, causing a Protection Violation trap upon attempted execution by a User- 
mode program. The hardware-development system can disable this Protection Viola- 
tion by holding the CNTL(1-0) inputs at 10 during a reset; this signals the presence of 
an external debugger and disables protection checking for Halt instructions until the 
next processor reset. 

If an external hardware debugger has signaled its presence, any condition that would 
othenA/ise cause the processor to take a Monitor trap instead causes the process or to 
enter the Halt Mode at location 16 in Instruction ROM address space (the WARN Trap 
handler). This permits the hardware-development system to debug system-level 
routines. If the processor enters the Halt Mode due to a synchronous trap, the Reason 
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Vector Register is updated, and tiie IVH\/! bit of tiie Current Processor Status Register 
is set. 

If an external debugger has signaled its presence and a valid breakpoint comparison 
is encountered, the processor enters the Halt Mode at the beginning of the Trace 
Trap handler. The Shadow Program Counter registers point to the location where the 
breakpoint was encountered. 

If a burst-mode instruction access Is established before the processor enters the 
Halt mode, it remains established when the processor enters the Halt mode, but is 
suspended. 

While in the Halt mode, the processor does not execute instructions, and performs no 
external accesses. The Timer Facility does not operate (i.e., the Timer Counter Regis- 
ter does not change). 

The Halt mode is exited whenever the Reset mode is entered, or the CNTL(1-0) lines 
place the processor into another mode. The only valid transitions on the CNTL(1-0) 
lines from the value of 10 are to the value 00, which places the processor into the 
Load Test Instruction mode, and to the value 1 1 , which causes the processor to re- 
sume normal execution. 

5.3.3.2 STEP MODE 

The Step mode causes the Am29050 microprocessor to execute at a rate determined 
by the hardware-development system, allowing the hardware-development system to 
easily control and monitor processor operation. The Step mode is defined so that 
normal operation may resume after stepping is complete. Since all external accesses 
are connpleted during any step, a minimum amount of system logic Is required to 
support the slower rate of execution. 

The Step mode is Invoked by the application of a value of 01 to the CNTL(I-O) inputs. 
The processor enters the Step mode within two or three cycles after the CNTL(I-O) 
Inputs are changed (depending on synchronization time), except that it first completes 
any external data access in progress. 

If a burst-mode instruction access Is established before the processor enters the 
Step mode, it remains established when the processor enters the Step mode, but is 
suspended. 

While in the Step mode, the processor does not execute instructions, and performs no 
external accesses. The Timer Facility does not operate (i.e., the Timer Counter Regis- 
ter does not change) while the processor is in the Step mode. 

The Step mode is identical to the Halt mode in every respect except one. This differ- 
ence is apparent on the transition of the CNTL(1-0) lines from the value 01 (Step 
mode) to the value 1 1 (Normal). On this transition, the processor steps. That is, the 
processor state advances by one pipeline stage, and it completes any external ac- 
cess which is initiated by this state change. 

If the processor immediately enters the Pipeline Hold mode on a step, the step may 
require multiple cycles to execute, since the processor pipeline cannot advance while 
the processor is in the Pipeline Hold mode. The STAT(2-0) lines reflect the state of 
the processor for every cycle of the step; STAT2 is High for one cycle, and only one 
cycle, before the step completes. 

The Timer Counter decrements by one for every cycle of the step; if the Timer 
Counter decrements to zero, the usual Timer-Facility actions are performed, and a 
Timer interrupt may occur. 
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After the step is performed, the processor re-enters the Step mode, and remains in 
the Step mode even though the CNTL(1-0) inputs have the value 1 1 (this prevents 
the need for a time-critical transition on the CNTL{1-0) inputs). The proces sor re- 
malns in this condition until the CNTL(1-0) inputs transition to 10 or 01 (or RESET is 
asserted). The transition to 10 causes the processor to enter the Halt mode, and is 
used to clear the Step mode. The transition to 01 causes the processor to remain in 
the Step mode, so that it may perform additional steps. 

5.3.3.3 HALT/STEP MODE AND LOADM/STOREM 

If the Am29050 microprocessor is placed in the Halt or Step mode while either a 
LOADM or STOREM instruction Is being executed, the STAT(2--0) outputs indicate 
the Halt or Step mode for one cycle (STAT(2-0) = 000), and then indicate the Pipeline 
Hold mode (STAT(2~0) = 001) until the final access of the LOADM or STOREM is 
complete, at which time they return to indicating the Halt or Step mode. A hardware- 
development system must therefore ignore any single-cycle Halt/Step mode indication 
on the STAT(2-^) outputs as an indication that the processor is halted. 

5.3.3.4 LOAD TEST INSTRUCTION MODE 

The processor incorporates an Instruction Register (IR) that holds instructions while 
they are decoded. In the Load Test Instruction mode, the IR is enabled to receive the 
content of the Instruction Bus, regardless of the state of the processor's Instruction 
Fetch Unit. This allows the hardware-development system to provide instructions for 
execution directly, thereby providing means for the hardware-development system to 
examine and modify the internal state of the processor without altering the proces- 
sor's instruction stream. 

The hardware-development system can place an instruction in the IR by first placing 
00 on CNTL(1-0). The processor enters the Load Test Instruction mode within two or 
three cycles after the CNTL(I-O) inputs are changed (depending on synchronization 
time), except that it first preempts any established burst-mode instruction access. The 
Load Test Instruction mode can be entered only from the Halt or Step modes. Note 
that the burst-mode instruction access that is preempted here was previously sus- 
pended for the Halt or Step modes. 

When the processor enters the Load Test Instruction Mode, the processor behaves 
as though the Current Processor Status Register were forced to the value shown in 
Figure 5-7, even though the register is not changed. 



Figure 5-7 Processor Status While in Load Test Instruction Mode 
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The visible processor state, including the Shadow Program Counter Registers, re- 
mains unchanged while the processor is in the Load Test Instruction Mode. The 
processor status shown in Figure 5-7 remains in effect until the next transition to the 
Normal Mode via the Halt Mode. 

While the processor is in the Load Test Instruction mode, it ignores all interrupts and 
trapSi except for the Data Access Exception and Coprocessor Exception. These latter 
exceptions are also ignored If the load or store which causes the exception has the 
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value 1 1 for the OPT bits (Indicating a load or store to the hardware-development 
system). 

The STAT{2-0) lines have a value of 010 while the processor is in the Load Test 
Instruction mode; this may be used as a verification that the processor is loading 
the IR. 

While the processor Is in the Load Test Instruction mode, the IR continually Is storing 
the value on the Instruction Bus; any change in the value on this bus is reflected in 
the IR on the next cycle. The hardware-development system can place a desired 
instru ction into t he IR by driving this instruction on the Instruction Bus. The value of 
IRDY and lERR are irrelevant. 

The processor exits the Load Test Instruction mode In the second cycle following a 
change on the CNTL{1-0) inputs. The only valid change here is either to the Halt 
mode (CNTL(1-0) = 10) or the Step mode (CNTL(1-0) = 01). 

When the Load Test Instruction mode is exited, the most recent value stored Into the 
IR is held. If the processor is placed In the Step mode, the IR is marked as having 
valid content, enabling the processor to decode and execute the instruction. If the 
processor Is placed in the Halt mode, it ignores any instruction placed in the IR by the 
Load Test Instruction mode, and reverts to its normal instruction-fetch mechanism. 

Once the IR has been set by the Load Test Instruction mode, the instruction in the IR 
may be executed via the Step mode as discussed in the previous subsection. A single 
step is sufficient to cause the execution of this instruction. However, because of 
pipelining, multiple steps may be required before the instruction completes execution. 
If more than one step is performed, the processor executes the instruction in the IR on 
every step. If it is desired to step an instruction to completion without repeated execu- 
tion, a NO-OP may be set into the IR (using the Load Test Instruction mode) after the 
first step. 

The Load Test Instruction mode may be used to cause the execution of most proces- 
sor instructions (restrictions are discussed below). This allows inspection and modifi- 
cation of processor state. 

The hardware-development system uses load and store instructions, executed via the 
Load Test Instruction mode, to alter and inspect the contents of general-purpose 
registers. The OPT field for these loads and stores have the value 110; this causes 
the system to ignore th e result ing access. Furthermore, it causes the Am29050 micro- 
processor to ignore the DRDY and DERR responses for the access; the Am29050 
microprocessor completes the ac cess a t the end of the next stepped instruction, 
rather than upon the assertion of DRDY. This eli minates the need for the hardware- 
development system to generate a synchronous DRDY in response to the load or 
store. 

Because of sequencing constraints, the Load Test Instruction mode cannot be used 
to cause the execution of the following instructions: conditional jumps. Load Multiple, 
Store Multiple, Interrupt Return, and Interrupt Return and Invalidate. Unconditional 
jumps and calls are permitted, but affect only the Program Counter (instruction se- 
quencing is not affected). 

It is not possible to execute a load directly following a store — nor a store directly 
following a load—using the Load Test Instruction mode. At least one NO-OP (or other 
operation) must be executed between adjacent loads and stores, because of control 
conflicts that arise when these instructions are stepped in a system that performs the 
resulting accesses at normal speed. However, a sequence of only loads or only 
stores is permitted without restriction. 
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The contents of the Program Counter 0, Program Counter 1 , Program Counter 2, 
Channel Address, Channel Data, Channel Control, and ALU Status registers are not 
updated while instructions are executed via the Load Test Instruction mode, except 
explicitly by Move To Special Register instructions. Instructions executed using the 
Load Test Instruction mode may access protected processor state even though the 
processor is in the User mode. 

Instructions executed via the Load Test Instruction mode may be used to access an 
external device or memory. Recall that the processor completes any normal data 
access before completing a step. This allows the processor to access devices and 
memories on behalf of the hardware-development system, and simplifies the timing 
constraints on the hardware-development system. 

During processor execution via the Load Test Instruction mode, the processor retains 
the Information required to resume normal operation. If any processor state is modi- 
fied by the hardware-development system, this state must be restored properly for 
normal operation to resume properly. 

In order to leave the Load Test Instruction mode and resume normal execution, an 
IRET instruction is placed into the IR and stepped through the processor pipeline. 
When the IRET Instruction is executed, the processor re-fetches the instructions at 
the addresses in the Shadow Program Counter and Shadow Program Counter 1 
registers. Following this, a transition on CNTL(1-0) to the Halt mode (CNTL(1-0) = 10) 
and then to the Normal mode (CNTL(1-0) = 11) causes the processor to leave the 
Load Test Instruction mode and resume normal operation. Alternatively, the hard- 
ware-development system can continue to use the Step mode to maintain control of 
the processor and step through its normal execution sequence. 

5.3.3.5 SUMMARY OF DEVELOPMENT SYSTEM OPERATION 

When the capabilities provided by the Halt, Step, and Load Test Instruction Register 
modes are combined, an extremely flexible test and development interface results. 
The following is an example sequence performed by the hardware-development 
system during debug: 

1 . Halt the processor either by a HALT instruction, by the hardware breakpoints, or 
by a 10 on the CNTL(I-O) inputs. The HALT instruction may be used as a 
primitive operation in the implementation of a general instruction-breakpoint 
capability. 

2. Load the IR with an instruction to inspect or alter the processor state. The 
hardware-development system should wait for the value 010 on STAT(2-0) (Load 
Test Instruction mode) before driving the Instruction Bus. After the IR Is loaded, 
the hardware-development system sets CNTL(I-O) to 01 (Step mode). 

3. Step the processor by a transition of CNTL(1~0) from 01 to 1 1 and back to 01 . 
Data may be supplied on the Data Bus during one of the steps to satisfy a load 
operation; the data must be held valid until the stepped Instruction completes. 

4. Repeat steps 2 and 3 as desired. Finally, perform steps 2 and 3 using an IRET 
instruction. 

5. After the final step, enter the Halt mode by placing 10, instead of 01 , on 
CNTL(I-O). 

6. Resume normal execution by placing 1 1 on CNTL(I-O). 
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5.3.4 Hardware Testing 



The Test mode In the Am29050 microprocessor allows processor outputs to be driven 
directly for testing or diagnostic purposes. The Test mode places all processor out- 
puts (except MSERR) into the high-impedance state, so that they do not interfere 
electrically with externally supplied signals. In all other respects, processor operation 
is unchanged. 



The Test mode is invoked by an active level on the TEST input, regardless of the 
processor's operational mode (for example, the Test mode is not affected by the Halt 
mode). The disabling of processor outputs is performed combinatorially, and is asyn- 
chronous to SYSCLK. 

For some outputs, the transition to the high-impedance state that results from the Test 
mode may occur at a much slower rate than applies during normal system operation 
(for example, when the processor relinquishes the channel to another master). For 
this reason, the Test mode may not be appropriate for special user-defined purposes. 

Note that SYSCLK is also placed in the high-impedance state by the Test mode. This 
allows the testing of external clock-distribution circuits, but care must be taken to 
insure that a high-impedance SYSCLK output does not have an adverse effect on the 
system. Furthermore, if SYSCLK is disabled, and a signal is not externally supplied, 
processor state may be lost. 

5.4 EXTERNAL INTERRUPTS AND TRAPS 



An external device causes an interru pt by as serting one of the INTR(3-0) inputs, and 
causes a trap by asserting one of the TRAP(1-0) inputs. Transitions on each of these 
inputs may be asynchronous to the processor clock; they are protected against 
metastable states. For this reason, an assertion of one of these inputs that meets the 
proper set-up-time criteria does not cause the corresponding interrupt or trap until the 
second following cycle. 



The INTR(3-0) inputs are prioritized with respect to each other and with respect to the 
processor. To resolve conflicts betwe en the se inputs, the inputs are prioritized in 
order, so t hat the interrupt caused by INTRO has the highest priority, and the interrupt 
caused by INTR3 has the lowest priority. 



The interrupts caused by INTR(3-0) may be masked by the Disable Interrupts (Dl) or 
Disable All Interrupts and Traps (DA) bits of the Current Processor Status Register. In 
addition, the Interrupt Mask (IM) field of the Current Processor Status Register sets 
the p riority of the processor with respect to these inputs. The IM field enables the 
INTR(3-0) inputs as follows: 



IM Value Result 

00 TnTr o enabled 

01 INTR(1-0) enabled 

10 iNTR(2--0) enabled 

11 INTR(3-0) enabled 



Note that the interrupt caused by the INTRO input cannot be disabled by the IM field. 



If one of the INTR(3-0) inputs is active, and the resulting interrupt is disabled by the 
DA bit, Dl bit or IM field, the Interrupt Pending (IP) bit of the Current Processor Status 
Register is set. The IP bit is reset if the interrupt is enabled, or if all disabled external 
interrupts are de-asserted. 
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The TRAP( 1-^) in puts are prioritized witli respect to e ach oth er, so that the trap 
caus ed by TRAP O has p riority over the trap ca used b y TRAP1 when a confli ct occurs. 
Both TRAPO and TRAP1 have priority over the INTR(3-0) inputs. The TRAP(1-0) 
inputs cannot be disabled selectively. Both traps, however, can be disabled by the DA 
bit in the Current Processor Status Register. 



The INTR(3-0) and TRAP{1-0) Inputs are level-sensitive. Once asserted, they must 
be held active until the corresponding interrupt or trap is acknowledged by the inter- 
rupt or trap handler (this acknowledgment is system-dependent, since there is no 
interrupt-acknowledge mechanism defined for the processor). 

If any of these inputs is asserted, then de-asserted before it is acknowledged, it is 
not possible to predict (unless the interrupt or trap is masked) whether or not the 
processor has taken the corresponding interrupt or trap. During inte rrupt a nd trap 
processing, the vector number is determined in part by which of the INTR(3-0) and 
TRAP(1-0) Inputs is active. If the input causing an interrupt or trap is de-asserted 
before the vector number is determined, the vector number Is unpredictable, with the 
result that processor operation is also unpredictable. 



There is a three-cycle latency from the de-assertion of an INTR(3-0) or TRAP(1-0) 
input to the time that the corresponding interrupt or trap is actually not recognized by 
the processor. The de-assertion must be timed so that, when the corresponding mask 
is reset, the processor does not recognize the interrupt or trap. Othenrt/ise, a spurious 
interrupt or trap may occur. 



5.5 PROCESSOR RESET 



When power is first applied to the processor, it is in an indeterminate state, and must 
be placed in a known state. Also, under certain circumstances, it may be necessary to 
place the processor in a defined state. This is accomplished by the Reset mode, 
which places the processor Into a pre-defined state (see Section 3.9). 

The Reset mode is invoked by asserting the RESET input, and can be entered only if 
the SYSCLK pin is operating normally, whether or not the SYSCLK pin is being driven 
by the proc essor (s ee Section 5.7). The Res et mode Is entered within four processor 
cycles after RESET is asserted. The RESET input must be asserted for at least four 
processor cycles to accomplish a processor reset. 

The Reset mode can be entered from any other process or mode (e.g., the Reset 
mode can be entered from the Halt mode). If the RESET Input Is asserted at the time 
that power Is first applied to the processor, the processor enters the Reset mode only 
after four cycles have occurred on the SYSCLK pin. 



The Reset mode is exited when the RESET input Is de-asserted. Either three or four 
cycles after RESET Is de-asserted (depending on internal synchronization time), the 
processor performs an initial instruction access on the channel. The initial instruction 
access is directed to address in the instruction read-only memory (instruction ROM). 
If instruction ROM Is not implemented In a particular system, another device or mem- 
ory must respond to this instruction fetch. 



If the CNTL(1-0) inputs are 10 or 01 when RESET is de-asserted, the processor 
enters the Halt or Step mode, respectively. If the processor enters the Halt mode 
immediately after reset, the protection checking that normally applies to the Halt 
Instruction Is disabled, so that the Halt Instruction can be used as an instruction 
breakpoint in a User-mode program. The Load Test Instruction mode cannot be 
directly entered from the Reset mode. If the CNTL(I-O) inputs are 00 immediately 
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after RESET is de-asserted, the effect on processor operation is unpredictable. If the 
CNTL(1-0) inputs are 1 1 , the processor enters the Executing mode. 



The processor samples the STATO output internally when RESET is asserted. A High 
level on STATO in this case is used to enable a special test configuration, and causes 
the processor to be inoperable. When RESET is asserted, the processor drives STATO 
Low in order to disable this test configuration. However, if processor outputs are 
disabled by the Test mode, the processor is not able to drive STATO. Thus, if RESET 
Is asserted when the processor is in the Test mode, the STATO pin must be driven 
Low externally. (In a master/slave configurati on, as d escribed in Section 5.8, STATO 
is driven Low by the master processor when RESET is asserted.) 



5.6 WARN INPUT 



An inactive-to-active tr ansition on the WARN input causes a WARN trap to be taken 
by the processor. The WARN trap cannot be disabled; the process or responds to the 
WARN input regardless of its internal condition, unless the RESET input also is as- 
serted. This input is provided so that the system can gain control of the processor in 
extreme situations, such as when system power is about to be removed or when a 
severe non-recoverable error occurs. 



The WARN input Is edge-sensitive, so that an active level o n the WA RN input for 
long int ervals does not cause the processor to take multiple WARN traps. However, 
WARN must be held active for at least 4 cy cles in o rder t o be pro perly recognized by 
the processor. The processor still takes t he WAR N trap if WARN is de-asserted after 
four cycles. Another WARN trap occurs if WARN makes another inactive-to-active 
transition. 



The processor enters the Executing mode when the WARN input is ass erted, r egard- 
less of its previous operational mode. Either seven or eight cycles after WARN is 
asserted (depending on internal synchronization time), the processor performs a 
trap-handler Instruction access on the channel. This instruction access is directed to 
address 16 In the instruction read-only memory (Instruction ROM). If Instruction ROM 
is not implemented in a particular system, another device or memory must respond to 
this instruction fetch. 

If the CNTL(1-0) Inputs are 10 or 01 when the trap-handler instruction fetch com- 
pletes, the processor enters the Halt or Step mode, respectively. Before the comple- 
tion of this instruction fetch, the CNTL(I-O) inputs are ir relevan t, except that the Load 
Test Instruction mode cannot be entered di rectly a fter a WARN trap is taken. If the 
CNTL(I-O) inputs are 00 immediately after WARN is de-asserted, the effect on proc- 
essor operation is unpredictable. If the CNTL(I-O) inputs are 1 1 , the processor re- 
mains in the Executing mode. 



5.7 CLOCKS 



The Am29050 microprocessor supports two methods of system-clock generation and 
distribution. In one arrangement, the processor generates a clock for the system at Its 
operating frequency; this clock appears on the SYSCLK pin, and may be distributed 
externally to other system components. In the second arrangement, the system pro- 
vides its own clock generation and distribution; in this case, the processor receives 
the externally generated clock on the SYSCLK pin. 
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In both arrangements, the circuits that generate and buffer SYSCLK are designed to 
minimize the apparent skew between internal processor clocks and external system 
clocks. 

The processor provides a power-supply pin named PWRCLK for the SYSCLK driver 
that is independent of all other chip power distribution. The separate PWRCLK supply 
electrically isolates other processor circuits from noise which might be induced on the 
power supply by the SYSCLK driver. The PWRCLK pin also is used to decide be- 
tween the two possible clocking arrangements. 

5.7.1 Processor-Generated Clock 

If power (i.e., +5 volts) is applied to the PWRCLK pin, the processor is configured to 
generate clocks for the system. In this case, the SYSCLK pin is an output, and the 
signal on INCLK Is used to generate the system clock. The processor divides the 
INCLK signal by two in the generation of SYSCLK, so INCLK should be driven at 
twice the processor's operating frequency. 

5.7.2 System-Generated Clock 

If the PWRCLK pin is grounded, the processor is configured to receive an externally 
generated clock. In this case, the SYSCLK pin is an input used directly as the proces- 
sor clock. SYSCLK should be driven at the processor's operating frequency. In this 
configuration, the INCLK input should be tied High or Low, except in certain master/ 
slave configurations as discussed in Section 5.8. 

5.7.3 Clock Synchronization 

The SYSCLK pin is at a High level during the first half of the processor cycle, and at a 
Low level during the second half of the processor cycle. Thus, a processor cycle 
begins on a Low-to-High transition of SYSCLK. The definition of the beginning of the 
processor cycle is independent of the clocking arrangement chosen for a particular 
system. 

In some systems, it might be desirable to have two or more processors operate In 
lock-step synchronization, with each processor driven by a com mon IN CLK signal. In 
this case, sync hronizat ion of the processors is achieved by the RESET input. If the 
de-assertion of RESET meets a specified set-up time with respect to the HIgh-to-Low 
transition of INCLK, the SYSCLK output is guaranteed to be Low after the second 
following rising edge of INCLK. Thus, all processors may be synchronized as re- 
quired. 

5.7.4 Electrical Specifications 

The electrical specifications for SYSCLK are different than the specifications for most 
other processor inputs and outputs. In order to reduce clock-skew effects, the 
SYSCLK pin is electrically compatible with the processor's CMOS circuits, rather than 
being compatible with transistor-transistor-logic (TTL) circuits. 

Note that the SYSCLK pin is placed in the high-impedance state by the Test mode. If 
an externally generated clock Is not supplied in this case, processor state may be lost. 
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5.8 MASTER/SLAVE CHECKING 

Each Am29050 microprocessor output has associated logic which compares the 
signal on the output with the signal that the processor is providing internally to the 
output driver. The comparison between the two signals is made any time a given 
driver is enabled, and any time the driver is disabled only because of the Test mode. 
If, when the comparison is made, the output of a driver does not agree with its input, 
the processor asserts the MSERR output on the second following cycle. 

When the processor asserts MSERR, it takes no other actions with respect to the 
detected miscomparlson. In particular, no traps occur. However, MSERR may be 
used externally to perform any system function, including the generation of a trap. 

5.8.1 Master/Slave Operation 

If there is a single processor in the system, the MSERR output indicates that a proc- 
essor driver is faulty, or that there is a short-circuit in a processor output. However, a 
much higher level of fault detection is possible if a second processor (called a slave) 
is connected In parallel with the first (called a master), where the slave processor has 
outputs disabled by the Test mode. 

The slave processor, by comparing its outputs to the outputs of the master processor, 
performs a comprehensive check of the operation of the master processor. In addi- 
tion, if the slave processor is connected at the proper position on the channel, it may 
detect open circuits and other faults in the electrical path between the master proces- 
sor and its local devices and memories. Note that the master processor still performs 
the comparison on its outputs in this configuration. 

5.8.2 Preventing Spurious Errors 

When two processors are connected in a master/slave configuration. It is necessary 
to prevent spurious assertions of MSERR. These result from situations where the 
outputs of the slave processor do not agree with the outputs of the master processor, 
but both processors are operating correctly. 

There are several potential sources of spurious errors in a master/slave configuration 
that are avoided by the Am29050 microprocessor design: 

1. Unimplemented bits In processor registers that are reflected on processor outputs. 
This is avoided in the Am29050 microprocessor design by having all 
unimplemented bits be read as 0. 



2. Unpredictable values for channel signals. If a DERR or lERR response Is asserted 
in response to an access, the Data Bus or Instruction Bus may be at an indeter- 
minate level (e.g., high-impedance), causing the master and slave processors to 
detect different values. If these values are later reflected on processor outputs, a 
spurious MSERR assertion may occur. The Am29050 microprocessor avoids this 
problem by ignoring the instruction or data word returned with DERR or lERR. 

3. Unpredictable power-up state that Is reflected on processor outputs. The 
Am29050 microprocessor avoids this problem upon reset by forcing to a known 
value any state that might be reflected on outputs before the completion of 
initialization. 

Another source of spurious errors is a lack of synchronization between the master 
and slave processors. To maintain synchronization between the master and slave 
processors. It is first necessary that they operate with identical clocks. This is 
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accomplished by having the master processor drive SYSCLK, with the slave proces- 
sor receiving SYSCLK as an input, or by driving both processors' SYSCLK inputs with 
the same externally generated clock. 

However, the fact that both processors operate with the same clock is not sufficient to 
guarantee synchronization. Asynchronous processor Inputs, If they are truly asynchro- 
nous to the operation of the master and slave processors, may affect the master 
processor a cycle sooner or later than they affect the slav e proce ssor. For this reason, 
the rele vant asynchronous inputs (i.e., WARN, INTR(3-0), TRAP(1-0), CNTL{1-0) and 
RESET) must be extern ally syn chronized to both the master and slave processors. 
Note that in the case of RESET, only the active-to-inactive transition must be 
synchronized. 

5.8.3 Switching Master and Slave Processors 

In some master/slave configurations, it might be desirable to give the slave processor 
control over the system when an error is Isolated to the master processor. It is possi- 
ble to grant control of the system to the slave processor by taking it out of the Test 
mode, and placing the master processor into the Test Mode. Note that synchroniza- 
tion must be maintained when this is accomplished (e.g., using the Halt mode). 

If the original master processor is configured to generate SYSCLK in this case, the 
slave processor also must generate SYSCLK when it becomes a master. Because of 
this, the INCLK signal must be supplied to both the master and slave processors, with 
both processors being configured to generate clocks. 

In this master/slave configuration, the slave processor still receives SYSCLK from the 
master processor as described previously. The slave processor does not drive 
SYSCLK because of the Test mode. However, when the slave processor is taken out 
of the Test mode, it is able to drive SYSCLK as required. 

Note that this processor-switching scheme may be generalized to more than two 
processors. 
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CHAPTER 6 



COPROCESSOR INTERFACE 



A coprocessor for the Am29050 microprocessor is an off-chip extension of the proc- 
essor's execution unit. The Am29050 microprocessor communicates with the co- 
processor using a mechanism that is very similar to the mechanism used to com- 
municate with other external devices and memories. However, because the copro- 
cessor extends the Instruction-execution capabilities of the processor, transfers to and 
from the coprocessor are in terms of operands, operation codes, results, and status 
information. This is In contrast to address and data transfers that occur for other types 
of external accesses. This chapter describes the coprocessor Interface, both from a 
software and a hardware point of view. 

6.1 COPROCESSOR PROGRAMMING 

6.1 .1 Overview of Coprocessor Operations 

A program executes the following steps to perform a coprocessor operation. This 
sequence is intended only as a guide, since there are many possible variations: 

1 . Send operands to the coprocessor. The number of transfers to the coprocessor 
depends on the number of operands, and the length of each operand. As many as 
64 bits of information can be transferred in a single cycle. 

2. Send an operation code and other operation Information to the coprocessor. The 
operation can be specified by as many as 64 bits of Information. 

3. Start the coprocessor operation. This can occur simultaneously with the 
operation-code transfer of step 2. 

4. Read the coprocessor results. The number of transfers from the coprocessor 
depends on the number of results, and the length of each result. 

The above sequence is defined so that coprocessor operations may be concurrent 
with other processor operations, including external accesses. This is possible be- 
cause coprocessor operations are decoupled from the transfer of information to and 
from the coprocessor. Once the operation is started, in step 3, the processor may 
continue further execution, overlapped with coprocessor execution, until the 
coprocessor results are read. 

Because the Am29050 microprocessor Implements overlapped loads, it can continue 
execution after attempting to read a coprocessor result. However, If the processor 
attempts to use the result before the operation is complete, the processor enters the 
Pipeline Hold mode until the operation Is complete. 

In certain circumstances. It may be desired to perform multiple coprocessor opera- 
tions before any results are read. For example, certain array computations form a 
single result from more than one operation. In this case, steps 1 through 3 above may 
be repeated — In any combination desired and as many times as desired — before 
results are read. The coprocessor Interface allows the coprocessor to prevent the 
transfer of operands and/or operation codes if it is not prepared to receive them. 
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6.1.2 Coprocessor Transfers 

All coprocessor transfers occur between general-purpose registers and the 
coprocessor. The transfers occur as the result of the execution of load and store 
instructions for which the Coprocessor Enable (CE) bit has a value 1 . For a store, the 
information transferred to the coprocessor is given either by the contents of two 
general-purpose registers, or by the contents of a general-purpose register and an 
8-bit constant. For a load, Information is transferred into a single general-purpose 
register in the Am29050 microprocessor. 

The coprocessor model includes no provision for addressing. Although it is possible to 
extend the coprocessor interface to include addressing, addressing is more appropri- 
ately handled by normal external accesses defined for the processor (such as input/ 
output). 

The format of the instructions that transfer information to and from a coprocessor is 
shown in Figure 6-1 . 



Figure 6-1 
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For coprocessor stores, the RA and RB or I fields specify the source of data to be 
transferred to the coprocessor. The RA field specifies a general-purpose register 
whose contents are transferred to the coprocessor. The RB or I field specifies either a 
general-purpose register whose contents are transferred to the coprocessor, or a 
zero-extended constant that is transferred to the coprocessor. For the latter, the M bit 
of the operation code (bit 24) determines whether the register or the constant is used, 
as with most instructions. Note that as many as 64 bits of information may be trans- 
ferred to the coprocessor by a single store instruction. 

For coprocessor loads, the data transferred from the coprocessor is written to the 
general-purpose register given by RA; the RB or I field is unused in this case (how- 
ever, the contents of the specified register, or the zero-extended constant, appears on 
the Address Bus). In contrast to the coprocessor store, a load transfers only 32 bits of 
information from the coprocessor. 

Other bits in the coprocessor load and store instructions are defined as follows: 

Bit 22: Transfer Control (TC)— This bit affects the behavior of the coprocessor for 
the transfer, depending on whether the transfer is for a load or store. The definition of 
this bit is by convention only, and is not enforced by the processor. 

For transfers to the coprocessor (i.e., stores), a value of 1 for the TC bit causes a 
coprocessor operation to start. For transfers from the coprocessor (i.e., loads) a value 
of 1 for the TC bit causes the coprocessor to suppress exception reporting. In either 
case, a value of for the TC bit has no special effect on the coprocessor. 
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Bit 21: Set Coprocessor Active (SA)— This bit is provided to signal the beginning 
and end of a coprocessor operation, so that the proper action may be taken by soft- 
ware if the operation is interrupted. 

An SA bit of 1 affects the Coprocessor Active (CA) bit in the Current Processor 
Status. If the SA bit is 1 for a store, the CA bit is set. If the SA bit is 1 for a load, the 
CA bit is reset. If the SA bit is 0, there Is no effect on the CA bit. 

Bit 20: reserved 

Bit 19: User Access (UA)— The UA bit allows programs executing in the Supervisor 
mode to emulate User-mode coprocessor transfers. This allows checking of the 
authorization of a transfer requested by a User-mode program. Note that this check- 
ing is performed externally, since the processor imposes no restriction on User-mode 
coprocessor transfers. 

If the UA bit is 1 , the coprocessor transfer is performed in the User mode, regardless 
of the value of the Supervisor Mode (SM) bit in the Current Processor Status. In this 
case, the User mode affects only the SUP/US output; it has no effect on the registers 
that can be accessed by the instruction. If the UA bit is 0, the program mode for the 
transfer is controlled by the SM bit. 

Bits 18-16: Option (OPT)— The OPT field is placed on the OPT(2-0) outputs during 
the coprocessor transfer. There is a one-to-one correspondence between the OPT 
field and the OPT(2-0) outputs; that is, the most-significant OPT bit is placed on 
0PT2, and so on. 

The OPT bits define the quantities being transferred to or from the coprocessor. For 
example, they can specify whether operands or operation codes are being trans- 
ferred. The interpretation of the OPT field depends on the definition of a given 
coprocessor. 

The transfer of data to or from the coprocessor may be caused by any load or store 
instruction defined for the processor; the operation of coprocessor transfers is very 
similar to the operation of external accesses. 

Coprocessor transfers are overlapped with the execution of instructions that sequen- 
tially follow the coprocessor load or store instruction. However, only one load or store 
may be in progress in any given cycle, whether or not the load or store Is directed to a 
coprocessor. The pipeline interlocks that apply to external accesses also apply to 
coprocessor transfers, except that coprocessor-transfer interlocks are determined by 
the time taken by the coprocessor to perform an operation, rather than the time taken 
to perform an access. 

Note that coprocessor transfers may be performed by Load Multiple and Store 
Multiple instructions. However, register RB has no defined interpretation for a Store 
Multiple to the coprocessor. For this reason. Store Multiple is defined to transfer 
multiple, 32-bit quantities to the coprocessor. Similarly, a Load Multiple transfers 
multiple, 32-bit quantities from the coprocessor. Note, however, that the incrementing 
address sequence defined for Load Multiple and Store Multiple still appears on the 
Address Bus for coprocessor transfers. 

6.1.3 Coprocessor Exceptions 

A C oproce ssor Exception trap occurs if the coprocessor reports an exception (using 
the DERR signal) during a coprocessor transfer. The Coprocessor Exception may 
occur either for a coprocessor load or store. 
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In the case of a load that reads a coprocessor result, the Coprocessor Exception can 
be used to indicate that the result is incorrect because of some exceptional condition. 
In some cases, the Am29050 microprocessor might be able to correct the results of 
the operation. 

In the case of a store to the coprocessor, the Coprocessor Exception can be used to 
indicate that the coprocessor cannot accept the transfer because of some exceptional 
condition. For example, it may indicate an error in a stream of calculations, where 
intermediate results are not being read. As with a load, the Am29050 microprocessor 
may be able to correct the exceptional condition. 

As noted above, the trap handler that executes as the result of the Coprocessor Ex- 
ception trap may attempt to correct the exceptional condition. In many cases, the trap 
handler must be able to read the intermediate results of the operation from the 
coprocessor, along with other information about the operation. When this information 
is read, it may be necessary to suppress further exception reporting, so that the trap 
handler does not create additional Coprocessor Exception traps. For this reason, the 
TC bit in the coprocessor load or store instruction allows the processor to read 
coprocessor results while suppressing exception reporting. 

Additionally, the TC bit allows a program to read the result of a coprocessor operation 
regardless of any errors that may have occurred. This provides an optional trapping 
capability analogous to that provided for certain Am29050 microprocessor arithmetic 
operations (e.g., Am29050 microprocessor instructions allow an optional trap on 
arithmetic overflow). 

6.1 .4 Coprocessor as a System Option 

When the coprocessor is a system option, coprocessor operations are performed by 
the processor when the coprocessor is not present. 

The coprocessor may be designed as a system option by use of the Coprocessor 
Present (CP) bit of the Configuration Register. The CP bit is set during system initiali- 
zation, based on the presence (CP = 1 ) or absence (CP = 0) of the coprocessor. If the 
CP bit is when the processor attempts to execute a coprocessor load or store in- 
struction, a Coprocessor Not Present trap occurs. 

When a Coprocessor Not Present trap is taken, the Channel Address, Channel Data, 
and Channel Control registers contain information related to the coprocessor transfer. 
This information may be used by the trap handler to emulate the operation of the 
coprocessor. 

6.1.5 Interrupted Coprocessor Operations 

The Coprocessor Active (CA) bit of the Current Processor Status Register may be 
used to indicate the duration of a coprocessor operation. The value 1 in the CA bit 
indicates that the coprocessor has begun an operation that has not completed (i.e., 
the final results have not been read). 

The CA bit is affected by the Set Coprocessor Active (SA) bit in the coprocessor load 
and store instructions. If the SA bit is 1 for a store, the CA bit is set; if the SA bit is 1 
for a load, the CA bit is reset. The routine that accesses the coprocessor is responsi- 
ble for setting and resetting the CA bit appropriately. 

If an interrupt or trap is taken during a coprocessor operation, and the CA bit has 
been properly managed, the CA bit of the Old Processor Status signals to an interrupt 
or trap handler that the interrupted routine had begun a coprocessor operation, but 
had not completed the operation before the interrupt or trap was taken. In this case, 
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the coprocessor contains state information that must be preserved. This information 
may be saved and restored across the interrupt or trap, or, alternatively, l^ept in the 
coprocessor. 

Upon an interrupt or trap, the state information contained in the coprocessor depends 
on both the operation being performed and the definition of the coprocessor. The 
methods used to determine what state information must be saved, and the methods 
used to transfer this information, are also dependent on the definition of the 
coprocessor. 

Due to Interrupt-latency considerations. It may be desirable to leave state information 
in the coprocessor upon interrupt, rather than require that it always be saved. A prob- 
lem arises, however, when a routine other than the one that was originally interrupted 
attempts to use the coprocessor. The coprocessor may be protected from such use 
by resetting the CP bit in the Configuration Register. If another routine attempts to use 
the coprocessor in this case, a Coprocessor Not Present trap occurs. The trap han- 
dler for this trap may either save the coprocessor state and make the coprocessor 
available to the trapping routine, or return control to the routine that was originally 
using the coprocessor. 

Certain coprocessor operations may not be interruptible. For these operations, inter- 
rupts may be disabled by the Disable Interrupts (Dl) and/or Disable All Interrupts and 
Traps (DA) bits in the Current Processor Status Register. However, this disabling can 
be performed only by a program in the Supervisor mode. Any User-mode programs 
that perform non-interruptible coprocessor operations incur the overhead of a call to a 
Supervisor-mode program. 

6.2 COPROCESSOR ATTACHMENT 

Communication with the coprocessor occurs via the Am29050 microprocessor chan- 
nel. Figure 6-2 illustrates a typical coprocessor connection. For transfers to the 
coprocessor, 64 bits of data are transferred in a single cycle, using the Address Bus 
and Data Bus simultaneously. For transfers from the coprocessor, 32 bits of data are 
transferred in a cycle, using the Data Bus. 

The width of transfers to the coprocessor is greater than the width of transfers from 
the coprocessor because the Am29050 microprocessor is optimized for computations 
performed on two word-length operands, with a single word-length result. The oper- 
and/result data flow of the processor Is reflected in the interface to the coprocessor. 

The protocol for coprocessor transfers is nearly identical to the protocol for other 
external accesses on the channel. Minor differences result from the fact that there are 
no addresses for coprocessor transfers, and from the fact that the coprocessor is 
operation-oriented, rather than access-oriented. 

6.2.1 Signal Description 

Coprocessor transfers are indicated on the channel by the DREQT1 output being 
High during a request. The DREQTO output also affects the transfer, based on the 
R/W signal, as follows: 

R/W DREQT1 DREQTO Meaning 

Transfer to coprocessor 

Transfer to coprocessor, start operation 

Transfer from coprocessor 

Transfer from coprocessor, suppress errors 
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Figure 6-2 
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Note that the interpretation of DREQTO during a coprocessor transfer is by convention 
only. 

The only signal unique to coprocessor transfers is the CDA input. The coprocessor 
de-asserts this signal whenever it can accept no transfers from the processor (nor- 
mally, this is because it is performing an operation). 
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The co mpleti on of a tra nsfer to the coprocessor is Indicated when the coprocessor 
asserts CDA. The Input DRDY is not used In t his ca se. The performance of transfers 
to the coprocessor Is enhanced by the use of CDA, since It elimina tes the need for the 
coprocessor to decode a transfer request and respond with DRDY and thereb y eiim i- 
nates the logic delay Involved. Not e that the coprocessor normally de-asserts CDA 
when it starts an operation, so that CDA can be Independent of transfer requests. 

6.2.2 Coprocessor Communication 

The Address Bus is used to transfer Information to the coprocessor. Therefore, the 
addressing function of other devices and memories on the channel must be disabled 
during coprocessor transfers. Since DREQT1 Is High for all coprocessor transfers, It 
should be used to inhibit the address-decoding function of channel devices and 
memories, as well as to Indicate to the coprocessor that a transfer is occurring. 

The OPT(2-0) outputs are used during coprocessor transfers to Indicate the type of 
transfer, or to provide other controls for the coprocessor. The interpretation of the 
OPT(2-0) signals_depends on the implementation of the coprocessor, and may also 
depend on the R/W signal. 

6.2.2.1 COPROCESSOR TRANSFER PROTOCOLS 

The protocols available for coprocessor transfers are based on the protocols for 
simple, pipelined, and burst-mode data accesses discussed In Section 5.2.6. The 
protocols for write accesses are used for transfers to the coprocessor, and the proto- 
cols for read accesses are used for transfers from the coprocessor. 

The protocol for coprocessor transfers differs In several respects from the protocol for 
external data accesses: 



1 . The CDA signal c onsist ently replaces the DRDY for transfers to the coprocessor. 
An active level on CDA, for transfers to the coprocessor, has an effect that Is 
equivalen t to the effect of an active level on DRDY for normal store-operations. 
Note that DRDY Is still used for transfers from the coprocessor. 

2. The Address Bus does not contain an address during a coprocessor transfer, but 
may contain data in the case of a transfer to the coprocessor. However, for 
transfers from the coprocessor, the Address Bus Is still sequenced as described in 
Section 5.2, a nd the sequencing Is determined by the same controls — except that 
CDA replaces DRDY for transfers to the coprocessor. The contents of the Address 
Bus are determined by the coprocessor load Instruction, as for other load 
Instructions. 



3. For any coprocessor transfer, an active level on DERR causes a Coprocessor 
Exception trap, rather than a Data Access Exception trap. 

4. For burst-mode coprocessor transfers, the Interpretation of sequential addressing 
is undefined. For this reason, burst-mode transfers are normally restricted to 32 
bits of information for every transfer, regardless of whether the transfer is to or 
from the coprocessor. Note, however, that the incrementing address sequence is 
still present in the definition of a burst-mode coprocessor transfer, and may be 
useful In some cases. 



6.2.2.2 SEQUENCING OF CDA 

The coprocessor de-asserts CDA whenever it cannot accept a transfer from the 
Am29050 microprocessor. An Inactive level on CDA prevents the Am29050 
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microprocessor from transferring operands or operation codes to tiie coprocessor 
when these transfers might interfere with coprocessor operation. 



Normally, the coprocessor de-asserts CD A when it begins an operation. CDA remains 
Inactive until the coprocessor has completed the operation and can accept further 
transfers from the processor. For s ome operations, a result may have to be read 
before the coprocessor can assert CDA. 

Independent of the presence of the coprocessor, a pull-down resistor in the range of 
33K to 68K ohms on CDA Is necessary for standard coprocessor detector to function 
properly. 



The coprocessor can acknowledge a transfer by as serting CDA. However, it is gener- 
ally more efficient for the coprocessor to hold CDA active as long as It can accept 
transfers. In the latter case, mult iple data transfers can occur at a high rate, without 
involving long logic delays. CDA Is related to the operation of the coprocessor in this 
case, rather than to the transfer of data. 



6.2.2.3 EXCEPTION REPORTING 



The coprocessor reports exceptions by the activation of DERR during any 
coprocessor transfer. This causes a Coprocessor Exception trap to occur. However, if 
the DREQT(I-O) signals have the value 1 1 f or a tra nsfer from the coprocessor, ex- 
ception reporting should be suppressed, and DERR should not be asserted. Note, 
however, that the Am29050 microprocessor does not enforce the suppression of 
exception reporting. 
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This chapter discusses programming topics as they relate to the Am29050 micropro- 
cessor. It focuses on the use of processor resources that were more formally de- 
scribed in Chapter 3. The presentation in this chapter is intended to be used as a 
guide In the implementation of software systems for the processor, not necessarily as 
a strict definition of how these systems should be implemented. 

This chapter is organized into four sections. The first section describes the run-time 
storage organization recommended for the Am29050 microprocessor and the use of 
the local registers to improve the performance of procedure calls. The two subse- 
quent sections discuss applications and systems programming for the processor. The 
final section discusses certain features of the Am29050 microprocessor pipeline that 
are exposed to — and must be properly handled by — software which executes on the 
processor. 

7.1 RUN-TIME STORAGE ORGANIZATION AND 

CALLING CONVENTION 

Programming languages that use recursive procedures, such as C and Pascal, gener- 
ally use a stack to store data objects that are dynamically allocated at run-time. The 
organization of the run-time storage, including the run-time stack, determines how 
data objects are stored and how procedures are called at the machine level. The 
Am29050 microprocessor is designed to minimize the overhead of calling a proce- 
dure, and allows efficient passing of parameters to a procedure and returning of 
results from a procedure. This section describes the Am29050 microprocessor run- 
time storage organization and procedure-calling conventions. 

7.1.1 Run-Time Stack Organization and Use 

A run-time stack consists of consecutive overlapping structures called activation 
records. An activation record contains dynamically allocated information specific to a 
particular activation (or call) of a procedure (such as local data objects). Because of 
recursion, multiple copies of a procedure may be active at any given time. Each active 
procedure has Its own unique activation record, allocated somewhere on the run-time 
stack. The local variables required by a particular procedure activation are contained 
In the activation record associated with that activation. Thus, the local variables for 
different activations do not interfere with one another. A compiler generates the In- 
structions to create and manage the run-time stack, and compiler-generated instruc- 
tions are based on its existence. 

As an example, Figure 7-1 shows three activation records on a run-time stack. This 
stack configuration was generated by procedure A calling procedure B, which in turn 
called procedure C. The fact that procedure C Is the currently active procedure Is 
reflected by its activation record being on the top of the run-time stack. The Stack 
Pointer points to the top of procedure C's activation record. 
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Figure 7-1 
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In Figure 7-1 , the storage areas labeled Out args and In args are the outgoing argu- 
ments area (for the caller) or the incoming arguments area (for the callee). These are 
shared between the caller procedure and the callee for the communication of parame- 
ters and results. The areas labeled locals contain storage for local variables, tempo- 
rary variables (for example, for expression evaluation) and any other items required 
for the proper execution of the procedure. 

MANAGEMENT OF THE RUN-TIME STACK 

A run-time stack starts at a high address in memory and grows toward lower memory 
addresses as procedures are called. The bottom of the stack is the location, with a 
high address, at which the stack starts; the top of the stack is the location, with a 
lower address, at which the most recent activation record has been allocated. 

When a procedure is called, a new activation record may need to be allocated on the 
run-time stack. An activation record is allocated by subtracting from the stack pointer 
the number of locations needed by the new activation record. The stack pointer is 
decremented so that variables referenced during procedure execution are referenced 
in terms of positive offsets from the stack pointer. 

When storage for an activation record is allocated, the number of storage locations 
allocated is the sum of the number of locations needed for: 

1. Local variables; 

2. Restarting the caller, such as locations for return addresses; and 

3. Arguments of procedures that may be called in turn by the called procedure (the 
outgoing arguments area). 

Note that, in some cases, no storage is required for one or more of the above items. 
Also, the incoming arguments area, though it is part of the activation record of the 
callee, is not allocated storage at this time, because this storage was allocated as the 
outgoing arguments area of the calling procedure. 

An activation record is de-allocated, just prior to returning to the caller, by adding to 
the stack pointer the value that was subtracted during allocation. 



PROGRAMMING 



The Am29050 microprocessor run-time storage actually is implemented as two 
stacks: the Register Stack and the Memory Stack. Storage Is allocated and de-allo- 
cated on these stacks at the same time. The Register Stack stores activation records 
associated with all active procedures (except leaf routines, as described later). The 
Memory Stack stores activation-record information that does not fit into the Register 
Stack or that must be kept in memory for other reasons (e.g., because of pointer 
de-references). Both the Register Stack and the Memory Stack are stored In the 
external data memory. However, a portion of the Register Stack is kept in the 
Am29050 microprocessor local registers for performance. The term stack cache in 
this section refers to the use of the local registers to contain a portion of the Register 
Stack. 

7.1.1.2 THE REGISTER STACK 

The Register Stack contains activation records for active procedures (Figure 7-2). An 
activation record In the Register Stack stores the following information. 

• Input arguments to the called procedure. This portion of the activation record is 
shared between a caller and the callee. It is allocated by the caller as part of the 
caller's activation record. 

• The caller's frame pointer. This is the address of the lowest-addressed byte above 
the highest-address word of the caller's activation record, and is used to manage 
the Register Stack. This portion of the activation record is shared between a caller 
and the callee. It is allocated by the caller as part of the caller's activation record. 

• The caller's return address. This is used to resume the execution of the caller after 
the called procedure terminates. This is also part of the caller's activation record. 

• The memory frame pointer. This Is the address of the top of the caller's Memory 
Stack (see below). This address is stored by the callee (If required), and used to 
restore the memory stack upon return. 
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• The local variables of the called procedure, if any. 

• Outgoing parameters of the called procedure, if any. 

• The frame pointer of the called procedure, if the procedure calls another procedure. 

• The return address for the called procedure, if the procedure calls another 
procedure. This location Is allocated in the Register Stack, and used when the 
called procedure calls another procedure. 

7.1 .1 .3 Am29050 MEMORY LOCAL REGISTERS AS A STACK CACHE 

The Am29050 microprocessor was designed for efficient Implementation of the Regis- 
ter Stack. Specifically, the Am29050 microprocessor can use the large number of 
relatively addressed local registers to cache portions of the Register Stack, yielding a 
significant gain in performance. Allocation and de-allocation of activation records 
occurs largely within the confines of the high-speed local registers, arki most proce- 
dure calls occur without external references. Furthermore, during procedure execution 
most data accesses occur without external references, because activation-record 
data are referenced most frequently. The principle of locality of reference— -which 
allows any cache to be effective— also applies to the stack cache. The entries in the 
stack cache are likely to remain there for re-use, because the size of the Register 
Stack does not change very much over long intervals of program execution. Activation 
records are typically small, so the 128 locations in the local register file can hold many 
activation records. 

Allocating Register-Stack activation records in the local registers is facilitated by the 
Stack Pointer in Global Register 1 . During the execution of a procedure, the Stack 
Pointer points simultaneously to the top of the Register Stack In memory and to the 
local register at the top of the stack cache. In other words, Global Register 1 , a word- 
length register, contains the 32-bit address of the top of the Register Stack, while bits 
8-2 of Global Register 1 (with a 1 appended to the most-significant bit) indicate the 
absolute register number of Local Register 0. Allocation and de-allocatlon of the 
Register Stack is accomplished by subtracting from or adding to, respectively, the 
value of the Stack Pointer. 

Using this register-addressing scheme, locations from the Register Stack are auto- 
matically mapped into the local register file. Figure 7-3 shows the relationship be- 
tween the Register Stack and the stack cache in the local registers. As shown, point- 
ers are required to define the boundaries between the Register Stack and the stack 
cache. 

• The register free bound (rfb, gr127) pointer defines the boundary between the 
portion of the Register Stack that is cached in the local registers and the portion that 
is stored In the external data memory. The rfb pointer contains the address of the 
first word in the Register Stack that is not contained in the local registers, but which 
is in memory. 

• The frame pointer (fp, Ir1) contains the memory address of the lowest-addressed 
word not in the current activation record. The current activation record is not 
necessarily in the data memory: the fp is used to determine whether or not an 
activation record is contained in the local registers when a procedure returns from a 
call, as described later. 

• The register stack pointer {rsp, gr1 ) points to the top of the Register Stack either in 
the local registers or the data memory; the rsp is contained in the local-register 
Stack Pointer (Global Register 1 ). The top of the Register Stack may or may not be 
contained in the data memory— the rsp simply defines the location of the top of the 
Register Stack. 
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Figure 7-3 
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• The register allocate bound {rab, gr126) pointer defines the lowest-addressed stack 
location that can be cached within the local registers. This defines the limit to which 
local registers can be allocated In the Register Stack. 

Several activation records may exist in the Register Stack at any given time, but only 
one stack location may be mapped to a local register at a given time. When the Reg- 
ister Stack grows beyond the 128-word capacity of the local registers, some move- 
ment of data between the stack cache and the Register Stack in data memory must 
occur. 

Stack overflow occurs when a procedure is called, but the activation record of the 
callee requires more registers than can be allocated In the stack cache (this is de- 
tected by comparing rsp with rab); Figure 7-4 illustrates stack overflow. In this case, 
the contents of a number of registers must be moved to data memory. The number of 
registers involved must be sufficient to allow the entire activation record of the callee 
to reside in the local registers. A block of the registers Is copied, or spilled Into an 
area of external data memory, freeing space in the local register file for the most 
recent procedure call. 

Stack underflow occurs when a procedure returns to the caller, but the entire activa- 
tion record of the caller is not resident in the stack cache (this is detected by compar- 
ing fp with rfb)\ Figure 7-5 illustrates stack underflow. In this case, the non-resident 
portion of the caller's stack must be moved from data memory to the local registers. 
Underflow occurs because overflow occurred at some previous point during program 
execution, causing part of the Register Stack to be moved to data memory. 
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Figure 7-4 
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The processor performs no hardware management of the stack cache, and cannot 
detect a reference to a quantity that is not in the stack cache. Consequently, software 
must keep the size of an activation record less than or equal to the size of the local 
register file (128 words). Any additional storage requirements are satisfied by the 
Memory Stack. 



7.1.1.4 THE MEMORY STACK 

In general, the Memory Stack is used to augment the Register Stack, holding addi- 
tional Information associated with activation records. For example, the Memory Stack 
holds large data structures than cannot fit into the Register Stack. Similar to the Reg- 
ister Stack, the Memory Stack contains a series of (possibly overlapping) activation 
records, each corresponding to a procedure activation. However, a Memory Stack 
activation record need not exist for a procedure that does not need a Memory Stack 
Area. The Memory Stack contains the following information: 

• Overflow incoming arguments. These are Incoming arguments that do not fit in the 
allowed Incoming arguments area of the Register Stack activation record. 

• Spilled incoming arguments. These are incoming arguments that cannot be kept in 
the Register Stack. For example, if the address of an argument is used in a called 
procedure, the associated value must be in the Memory Stack. 

• Any procedure-local variable not allocated to a register. 

• Local block space. This storage is allocated dynamically on the Memory Stack. It is 
used to implement functions such as the allocaQ function in the C programming 
language. 

• Overflow outgoing arguments. These are outgoing arguments that do not fit In the 
allowed outgoing arguments area of the Register Stack activation record. 

In contrast to the Register Stack, the Memory Stack is not cached and has no fixed 
size limit. The top of the Memory Stack is defined by the memory stack pointer (msp), 
which Is stored in Global Register 125 by convention. 

7.1.2 Procedure Linkage Conventions 

The procedure linkage conventions define the standard sequences of instructions 
used to call and return from procedures. These instruction sequences perform the 
following operations (other, more-general operations may also be required, as de- 
scribed later): 

• Put procedure arguments to the outgoing arguments area In the activation record. 
This may or may not involve copying the arguments; copying is not necessary if the 
arguments are placed into the appropriate registers as the result of computation. 

• Branch to the procedure using a call instruction, which also places the return 
address in a register. 

• Allocate a frame on the Register Stack. A frame is the storage that contains the 
procedure's activation record. 

• If overflow occurs during frame allocation, spill the least-recently used locations of 
the Register Stack. The number of spilled locations must be sufficient to allow the 
new frame to reside entirely within the local registers. 

• Determine the frame-pointer value of the called procedure, if this procedure may 
call another procedure. 

• Execute the procedure. 

_____ _ 



• Place return values into the appropriate registers. 

• De-allocate the activation-record frame. 

• Fill locations of the local registers from the Register Stack in external memory, if 
underflow occurs. 

• Branch to the procedure's return address. 

This section describes the routines that implement the Am29050 microprocessor 
procedure linkage conventions. The operations described here are not required on 
every procedure call. In some cases, operations can be omitted or simpler routines 
used; these cases and the accompanying simplifications are also described here. 

7.1 .2.1 ARGUMENT PASSING 

The linkage convention allows up to 16 words of arguments to be passed from the 
caller to the callee In local registers. These arguments are passed in Local Register 2 
through Local Register 17 of the caller (note that the local-register numbers are differ- 
ent for the caller and the callee, because of Stack-Pointer addressing). 

When more than 16 words are required to pass arguments, the additional words are 
passed on the Memory Stack. In this case, the memory stack pointer (In Global Regis- 
ter 125) points to the 17th word of the arguments, and the remaining argument words 
have higher memory addresses. Multi-word arguments may be split across the Regis- 
ter Stack and the Memory Stack. For example, if a multi-word argument starts on the 
16th word of the outgoing arguments, the first word of the argument is passed in the 
Register Stack, and the remainder of the argument is passed in the Memory Stack. 

All arguments occupy at least one word; arguments which are a byte or half-word in 
length (for example, a character) are padded to 32 bits and passed as a full word. 
However, an array or structure composed of multiple byte or half-word components is 
passed as a single, packed array or structure of bytes or half-words rather than an 
array or structure of padded bytes or half-words. 

No argument is aligned to other than a word address boundary, including multi-word 
arguments. Some multi-word arguments are referenced as a single object (for exam- 
ple, double-precision floating-point values). Note that it rnay be necessary to copy 
such arguments to an aligned memory or register area before use. 

7.1.2.2 PROCEDURE PROLOGUE 

When a procedure Is called, and the procedure may call another procedure, the callee 
must allocate a frame for itself on the Register Stack (this Is not required for leaf 
procedures that do not call other procedures, as described later). A frame is allocated 
by decrementing the register stack pointer to accommodate the size of the required 
activation record. The procedure prologue is the instruction sequence that allocates 
the callee's Register Stack frame. 

To allocate the stack frame, the prologue routine decrements the register stack 
pointer by the amount rsize (see Figure 7-6). The value of rsize must be an even 
number given by the following formula: 

rs/ze> (size of local variable area) + (size of outgoing arguments area) + 2 

The value 2 in this formula accounts for the space required by the return address (in 
Local Register 0) and the frame pointer (in Local Register 1). The size of the local 
variable area includes the space for the memory frame pointer, if required. If the 
formula total Is an odd value, the total must be adjusted (by adding 1) so that the 
resulting rsIze value Is even. This aligns the top of the Register Stack on a double- 
word boundary. The reason for this alignment is that double-precision floating-point 
values must be aligned to registers with even absolute-register numbers. Alignment of 
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double-precision values is accomplislied by placing these values into even-numbered 
local registers and making rsize even (it is also assumed that the register stack 
pointer is initialized on an even-word boundary). 

Note that rsize Is not the size of the entire activation record of the callee, because the 
callee's activation record includes storage that was allocated as part of the caller's 
activation record frame (e.g., the caller's outgoing arguments area, which Is the 
callee's incoming arguments area). The size of the callee's entire activation record is 
denoted size, and Is given by the following formula: 

size = rsize + (size of the incoming arguments area) + 2 

In the prologue routine, the following instruction Is used to allocate the stack frame 
(rsp=gr1): 

prologue: 

sub rsp,rsp,rsize*4 ; *4 converts words to bytes 

However, this instruction does not account for the fact that there may not be enough 
room in the local registers to contain the activation record. There must be additional 
instructions to detect stack overflow and to cause spilling if overflow occurs. This Is 
accomplished by comparing the new value of the register stack pointer with the value 
of the register allocate bound and invoking a trap handler (with vector number 
V_SPILL) if overflow is detected. 

Furthermore, if the procedure calls another procedure, the prologue must compute a 
frame pointer. The frame pointer will be used by procedures called in turn by the 
callee to insure that the callee's activation record is in the local registers upon return 
(i.e., that it has not been spilled onto the Register Stack In data memory). The frame 
pointer Is computed In the prologue because It need only be computed once, regard- 
less of how many procedures are called by given procedure. 
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The complete procedure prologue is then (fp = Ir1): 
prologue: 



sub 

asgeu 

add 



rsp, rsp, rsize*4 
V_SPILL, rsp. rab 
fp, rsp, size*4 



allocate frame 

call spill handler if needed 

compute frame pointer 



7.1 .2.3 SPILL HANDLER 

If overflow occurs, the assert Instruction in the prologue fails, causing a trap. The 
trap handler invokes a User-mode routine in the trapping process to spill Register 
Stack locations from the local registers to external memory. Having most of the spill 
handling in a User-mode routine minimizes the amount of time that interrupts are 
disabled, and insures that spilling is performed using the correct virtual-memory 
configuration. 

The spill handler uses two registers. The first register, Global Register 121, normally 
contains a trap-handler argument (tav), but is used by the spill handler as a temporary 
register. The second register, Global Register 122, stores a trap handler return ad- 
dress (tpc). This register is used by the User-mode spill handler to return to the trap- 
ping procedure. It is assumed that the address of the User-mode spill handler is 
contained in a global register, denoted user_spill_reg in the following instruction 
sequence. 

The complete spill handler is: 
Spill: 



user_splll: 







; operating-system routine 


mfsr 


tpc, PCI 


; save return address 


mtsr 


PC1,user_spilLreg 


; branch to User spill via Interrupt return 


add 


tav, user spill reg, 4 




mtsr 


PCO, tav 




iret 




; User-mode spill handler 


sub 


tav, rab, rsp 


; compute spill: allocate bound - rsp 


srI 


tav, tav, 2 


; shift to get number of words 


sub 


tav, tav, 1 


; count is one less 


mtsr 


CR.tav 


; set Count Remaining Register 


sub 


tav, rab, rsp 




sub 


tav, rfb, tav 


; compute new free bound 


add 


rab, rsp, 


; adjust allocate bound 


storem 


0, 0, IrO, tav 


; spill 


jmpi 


tpc 


; return to trapping procedure 


add 


rfb, tav, 


; adjust free bound 



7.1.2.4 RETURN VALUES 

If the called procedure returns one or more results, the first 16 words of the result(s) 
are returned in Global Register 96 through Global Register 111, starting with Global 
Register 96. 

If more than 16 words are required for the results, the additional words are returned in 
memory locations allocated by the caller. In this case, a large return pointer (Irp) 
provided by the caller in Global Register 123 at the time of the call points to the 17th 
word of the results, and subsequent words are stored at higher memory addresses. 

7.1.2.5 PROCEDURE EPILOGUE 

The procedure epilogue de-allocates the stack frame that was allocated by the proce- 
dure prologue, and returns to the calling procedure. Stack de-allocation is accom- 
plished by adding the rsize value back to the register stack pointer, after which the 
de-allocated registers are no longer used and are considered invalid. The epilogue 
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7.1.2.6 



7.1.2-7 



7.1.2.8 



also detects stack underflow and causes register fiiiing if underflow occurs. This is 
accomplished by comparing the value of the caller's frame pointer with the register 
free bound and invoking a trap handler (with vector number V_FILL) if underflow is 
detected. Finally, the epilogue returns to the caller using the caller's return address. 

The complete procedure epilogue is: 

epilogue: 



add 
nop 
asleu 
jmpi 



rsp, rsp, rsizeM ; add back rsize count 

; cannot reference a local register here 
; call fill handler if needed 
; jump to return address 



V_FILL. fp, rfb 
IrO 



FILL HANDLERS 

If underflow occurs, the assert instruction in the epilogue fails, causing a trap. The 
trap handler invokes a User-mode routine in the trapping process to fill Register Stack 
locations from the external memory to local registers. The fill handler is similar in 
organization to the spill handler discussed above. 

The complete fill handler is: 

Fill: ; operating-system routine 

; save return address 
; branch to User fill via interrupt return 



userjill: ; User-mode fill handler 

; put starting register number into Indirect 
; Pointer A 

; compute number of bytes to fill 

; adjust the allocate bound 

; change byte count to word count 

; make count zero-based 

; set Count Remaining register 

; fill 

; return to trapping procedure 

; adjust the free bound 

THE REGISTER STACK LEAF FRAME 

A leaf procedure is one that does not call any other procedure. The incoming argu- 
ments of a leaf procedure are already allocated in the calling procedure's activation- 
record frame, and the leaf routine is not required to allocate locations for any outgoing 
arguments, frame pointer or return address (since it performs no call). Hence, a leaf 
procedure need not allocate a stack frame in the local registers, and can avoid the 
overhead of the procedure prologue and epilogue routines. Instead, a leaf routine can 
use a set of global registers for local variables; Global Register 96 through Global 
Register 124 are reserved for this purpose (among other purposes). If there is an 
insufficient number of global registers, the leaf procedure may allocate a frame on the 
Register Stack. 

LOCAL VARIABLES AND MEMORY-STACK FRAMES 

A called procedure can store its local variables and temporaries in space allocated in 
the Register Stack frame by the procedure prologue. The values are referenced as an 
offset from the rsp base address, using the Stack-Pointer addressing of the Am29050 



mfsr 


tpc.PCI 


mtsr 


PCI.userJilLreg 


add 


tav, user fill reg, 4 


mtsr 


PCO, tav 


iret 




const 


tav, 0x80 < < 2 


or 


tav, tav, rfb 


mtsr 


IPA, tav 


sub 


tav, Ir1, rfb 


add 


rab, rab, tav 


srI 


tav, tav, 2 


sub 


tav, tav, 1 


mtsr 


CR, tav 


loadm 


0, 0, grO, rfb 


jmpi 


tpc 


add 


rfb, in , 
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microprocessor local registers. No object in a register is aligned on anything smaller 
than a register boundary, and all objects take at least one register. 

Because there are 128 local registers, the total Register Stack activation-record size 
may not be greater than 128 words. If the callee needs more space for local variables 
and temporaries, it must allocate a frame on the Memory Stack to hold these objects. 
To allocate a Memory-Stack frame, the procedure prologue decrements the memory 
stack pointer {msp, in gr125). The procedure epilogue de-allocates the Memory-Stack 
frame by incrementing the msp. 

A procedure that extends the Memory Stack dynamically (e.g., using allocaQ) must 
make a copy of the msp at procedure entry, before allocating the Memory-Stack 
frame. The msp is stored in the memory frame pointer (mfp) entry of the activation 
record in the Register Stack. The procedure then can change the msp during execu- 
tion, according to the needs of dynamic allocation. On procedure return, the Memory- 
Stack frame is de-allocated using the mfp to restore the msp. A procedure that does 
not extend the Memory Stack dynamically need not have an mfp entry in Its activation 
record. 

The following prologue and epilogue routines are used if there is no dynamic alloca- 
tion of the Memory Stack during procedure execution, but a Memory Stack frame Is 
othenA/ise required: 

prologue: 



sub 


rsp, rsp, <rsize>*4 


; allocate register frame 


asgeu 


V_SPILL, rsp, rab 


; call spill handler if needed 


add 


fp, rsp, <size>*4 


; compute register frame pointer 


sub 


msp, msp, <msize> 


; allocate memory frame 

; msize = size of memory frame in words 


add 


rsp, rsp, <rsize>*4 


; de-allocate register frame 


add 


msp, msp, <msize> 


; de-allocate memory frame 


jmpi 


IrO 


; return 


asleu 


V FILL, fp, rfb 


; call fill handler if needed 



epilogue: 



The following prologue and epilogue routines are used if there is dynamic allocation of 
the Memory Stack during procedure execution: 



sub rsp, rsp, <rslze>*4 

asgeu V_SPILL, rsp, rab 

add fp, rsp, <slze>*4 

add lr{<rslze> - 1 }, msp, 

sub 



epilogue: 



add 

add 
nop 
jmpi 
asleu 



msp, msp, <msize> 

msp, lr{<rslze>-1},0 

rsp, rsp, <rsize>*4 

IrO 

V_FILL, fp, rfb 



prologue: 

allocate register frame 

call spill handler if needed 

compute register frame pointer 

save memory frame pointer 

lr{rsize-1} Is last reg in new frame 

allocate memory frame, 

msize = size of memory frame in words 

; restore memory stack pointer 

; de-allocate memory frame 

; de-allocate register frame 

; cannot reference a local register here 

; return 

; call fill handler if needed 

STATIC LINK POINTER 

Some programming languages (notably Pascal) permit nested procedure declara- 
tions, introducing the possibility that a procedure may reference variables and 
arguments which are defined and managed by another procedure. This other 
procedure is a static parent of the callee. A static parent is determined by the 
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declarations of procedures in the program source, and is not necessarily the calling 
procedure; the calling procedure is the dynamic parent Since procedures can be 
nested at a number of levels, a given procedure may have a number of hierarchically 
organized static parents. 

A called procedure can locate its dynamic parent and the variables of the dynamic 
parent because of the return address and frame pointer in the Register Stack. How- 
ever, these are not adequate to locate variables of the static parent which may be 
referenced in the procedure. If such references appear in a procedure, the procedure 
must be provided with a static link pointer (sip). In the Am29050 microprocessor 
run-time organization, the sip is stored in Global Register 124. Since there can be a 
hierarchy of static parents, the sip points to the sip of the immediate parent, which in 
turn points to the sip of its immediate parent, and so on. Note that the contents of 
Global Register 124 may be destroyed by a procedure call, so a procedure needing to 
reference the variables of a static parent may need to preserve the sip until these 
references are no longer necessary. 

7.1 .2.1 FLOATING-POINT ACCUMULATORS 

A called procedure, if it needs to save and restore the floating-point accumulators, 
may save and restore the accumulators by treating them as double-precision even 
though they may contain single-precision values. Treating the floating-point accumu- 
lators as double-precision values is accomplished by saving the Floating-Point Envi- 
ronment Register, then forcing the Accumulator Format Field to 10 (double-precision). 
The accumulators and the Floating-Point Environment Register must be restored 
before returning to the calling procedure. Floating-point accumulators are not pre- 
served across procedure calls. 

7.1 .2.1 1 TRANSPARENT PROCEDURES 

A transparent procedure is one that requires very little overhead for managing run- 
time storage. Transparent procedures are used in the Am29050 microprocessor 
run-time organization primarily to implement compiler-specific support functions, such 
as integer divide. 

A transparent routine does not allocate any activation-record frames. Parameters are 
passed to a transparent procedure using tav and the Indirect Pointer A, B, and C 
registers. The return address is stored in tpc. This convention allows a leaf procedure 
to call a transparent procedure without changing its status as a leaf procedure. There 
is a tight relationship between a compiler and the transparent procedures it calls. 
Some transparent procedures may need more temporary registers and the compiler 
must account for this. 

7.1 .3 Register Usage Convention 

The Am29050 microprocessor run-time organization standardizes the uses of the 
local and global registers. This section summarizes register use and the nomencla- 
ture for register values: 

• GR1: Register stack pointer (rsp). 

• GR2-GR3: Condition Code Accumulator. 

• GR4-GR63: Unimplemented. 

• GR64-GR95: Reserved for operating-system use. 

• GR96-GR1 1 1 : Procedure return values. Lower-numbered registers are used 
before higher-numbered registers. If more than 16 words are needed, the additional 
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words are stored in memory (see GR123, large return pointer). Tiiese registers are 
also used for temporary values that are destroyed upon a procedure call. 

• GR1 12-GR1 15: Reserved for programmer. These registers are not used by the 
compiler, except as directed by the programmer. 

• GR116-GR120: Compiler temporaries. 

• GR121 : Trap handler argument/temporary (tev^)— This register is used to 
communicate arguments to a software-invoked trap routine. It can be destroyed by 
the trap, but not by other traps and interrupts not explicitly generated by the 
program (for example, a Timer trap). 

• GR122 Trap handler return address/temporary (tpc). This register is also used by 
software-invoked traps. It can be destroyed by the trap, but not by other traps and 
interrupts not explicitly generated by the program (for example, a Timer trap). 

• GR123: Large return pointer/temporary (/rp). 

• GR124: Static link pointer/temporary (s/p). 

• GR125: Memory stack pointer (A7?sp). 

• GR126: Register allocate bound (rab). 

• GR127: Register free bound (r/3b). 

• LRO: Return address. 

• LR1: Frame pointer. 

In this convention, registers must be handled by software according to system re- 
quirements. The following practices are recommended: 

• GR64-GR95 should be protected from User-mode access by the Register Bank 
Protect Register. 

• The contents of GR96-GR124 should be assumed destroyed by a procedure call, 
unless the procedure is a transparent procedure. 

• The contents of GR121 and GR122 should be assumed destroyed by any 
procedure call or any program-generated trap. 

• The contents of GR125 are always preserved by a procedure call. 

• The contents of GR126 and GR127 are managed by the spill and fill handlers and 
should not be modified except by these handlers. 

Example of a Complex Procedure Call 

The following code sequence demonstrates a complex procedure call, illustrating how 
registers are used in the run-time organization: 

caller: 



(other code) 






add 


Irp, msp, 32 


; pass Irp 


add 


sip, msp, 120 


; pass a static link 


call 


IrO, callee 




const 


Ir2, 1 


; 1 as first argument 


(other code) 
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callee: 



const tav, (126-2)M 

sub rsp, rsp, tav 

asgeu V_SPILL, rsp, rab 

const tav,(126-2)*4 + (3*4) 

add fp, rsp, tav 

add Ir123,msp, 

const tav, memoryJrame_size 

consth tav, memoryJrame_size 

sub msp, msp, tav 

add Ir18, lrp,0 

add Ir19, sip, 



(other code) 



add msp, Ir123, 

const tav, (126-2)*4 

add rsp, rsp, tav 

const gr96, 1 

jmpi IrO 

asleu V_FILL, fp, rfb 



giant register allocation 
allocate register frame 

incoming arguments and overhead 

create frame pointer 

for dynamic Memory-Stack allocation 

big msize 

high half of msize 

allocate memory frame 

save Irp for later 

save sip for later 



de-allocate memory frame 

giant allocation size 

de-allocate register frame 

return value 

return to caller 

ensure caller's registers in frame 



7.1 .5 Trace-Back Tags 

A trace-back tag is either one or two words of information Included at the beginning of 
every procedure. This information permits a debug routine to determine the sequence 
of procedure calls and the values of program variables at a given point in execution. 
The trace-back tag describes the memory frame size and the number of local regis- 
ters used by the associated procedure. A one-word tag is used if the memory frame 
size is less than 2K words; otherwise, the two-word tag is used. Regardless of tag 
length, the tag directly precedes the first instruction of the procedure. Figure 7-7 
shows the format of the trace-back tags. 

The first word of a trace-back tag starts with the Invalid operation code 00 (hexadeci- 
mal). This unique, invalid instruction operation code allows the debugger to locate the 
beginning of the procedure In the absence of other information related to the begin- 
ning of the procedure, such as from a symbol table. This is particularly useful after a 
program crash, in which case the debug routine may have only an arbitrary instruction 



Figure 7-7 



Trace-Back Tags 

One-word tag: 
31 



23 



15 



1 1 1 

0000000 





M 


T 


argcount 


1 

Reserved 


1 

msize 


res 


Two-word tag: 

31 23 15 7 


1 1 1 1 1 1 1 1 1 

msize 


1 




1 1 1 

00000000 


1 


M 


T 


MM 

argcount 


1 1 

Reserved 


M Ml 

Reserved 
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31-24 


opcode 


23 


tag type 


22 


m 


21 


t 


20-16 


argcount 


15-11 


Reserved 


10-3 


msize 


2-0 


Reserved 



address within a procedure. The call sequence up to the current point in execution 
can be determined from the rsize and msize values In the trace-back tag. However, 
for procedures that perform dynamic stack allocation (e.g., using allocaQ), the mem- 
ory frame pointer must be used. 

The tag word immediately preceding a procedure contains the following fields. Re- 
served fields must be zero. 

Bits item Description 

Hexadecimal 00 (an invalid opcode) 

0/one-word tag; 1 /two-word tag 

0/no mfp; l/m/'pused 

0/normal; 1 /transparent procedure 

Number of arguments in registers (includes IrO and Irl) 

Resen/ed, must be zero 

Memory frame size in doublewords (if bit 23 is 0) 

or reserved (if bit 23 is 1) 

Resen/ed, must be zero 

If the procedure uses a Memory-Stack frame size 2K words or more, the msize field is 
contained in the second tag word immediately preceding the first tag word. 

7.2 APPLICATIONS-PROGRAMMING CONSIDERATIONS 

This section discusses topics of general concern in the implementation of applications 
programs. 

7.2.1 Addressing General-Purpose Registers Indirectly 

Registers in the processor usually are addressed directly by fields within instructions. 
However, indirect addressing of registers may be required in some situations, such as 
when a program pointer is known to point to a variable that is resident in the register 
file. 

Three special registers— Indirect Pointers A, B, and C— are provided so that separate 
indirect register numbers can be set for each of the source and destination operands 
within an instruction. Indirect Pointer C corresponds to the destination register RC, 
Indirect Pointer A corresponds to the RA operand register, and Indirect Pointer B 
corresponds to the RB operand register. 

A given indirect pointer (the value in the corresponding register) Is used to address 
the register file whenever Global Register is specified as a source or destination 
register. For example, a value of in the RA field of an instruction causes the content 
of the Indirect Pointer A Register to be used to access the RA operand. 

The indirect pointers can be set by the four multiply instructions, the floating-point 
instructions, Move To Special Register instructions, and by the instructions EMU- 
LATE, DIVIDE, DIVIDU, and Set Indirect Pointers (SETIP). The Move To Special 
Register instructions set the indirect pointers Individually as special-purpose registers. 
Of the remaining Instructions, all but the EMULATE instruction set all three indirect 
pointers simultaneously, deriving the values that are written into the pointers from the 
instruction fields RC, RA, and RB. The EMULATE instruction sets all three indirect 
pointers, but only the Indirect Pointer A and Indirect Pointer B registers are written 
with meaningful values. They may be destroyed by DIVIDE, DIVIDU, MULTIPLY, 
MULTIPLU, MULTM, MULTMU, and the floating-point instructions. 



When an indirect pointer is set by a IVIove To Special Register, bits 9-2 of tiie source 
operand are copied to corresponding bits in the indirect pointer. This allows the ad- 
dressing of general-purpose registers, via the indirect pointers, to be consistent with 
the addressing of words in external memories and devices. 

When the Indirect pointers are set from instruction fields, the resulting values reflect 
the Stack-Pointer addition that is performed on local registers. In addition, register 
bank-protection checking is performed on the values that are loaded. A Protection 
Violation trap occurs if the values represent registers that cannot be accessed. The 
indirect pointers may thus be used to access exactly those operands that would be 
accessed by the Instruction fields setting the indirect pointers. Consequently, a routine 
that emulates an instruction operation, can access, with no overhead, the source and 
destination registers for the instruction being emulated. No copying of arguments and 
results needs to be done. 

The indirect pointers are also set by the floating-point, MULTIPLY, MULTM, MULTI- 
PLU, and MULTMU instruction when these cause exceptions, to allow the exception 
handler to access the instruction operands. 

When using indirect register addressing, at least one cycle of delay must separate 
any instruction that sets an indirect pointer and any instruction which de-references 
that pointer. This restriction is the result of processor pipelining (see Section 7.4.3). 

7.2.2 Run-Time Checking 

The assert instructions provide programs with an efficient means of comparing two 
values and causing a trap when a specified relation between the two values is not 
satisfied. The instructions assert that some specified relation is true, and trap if the 
relation is not true. This allows run-time checking— such as checking that a computed 
array index is within the boundaries of the storage for an array— to be performed with 
a minimum performance penalty. 

Assert instructions are available for comparing two signed or unsigned operands. The 
following relations are supported: equal-to, not-equal-to, less-than, less-than or equal- 
to, greater-than, and greater-than-or-equal-to. 

The assert instructions specify a vector number for the trap. However, only vector 
numbers 64 through 255 (Inclusive) may be specified by User-mode programs. If a 
User-mode assert instruction causes a trap, and the vector number is between and 
63 inclusive, a Protection Violation trap occurs, instead of the specified trap. 

Since the assert instructions allow the specification of the vector number, several 
traps may be defined in the system, for different situations detected by the assert 
instructions. 

7.2.3 Operating System Calls 

An applications program can request a service, from the operating system by using 
the following instruction: 

asneq System_Routlne, gr1, gr1 

This instruction always creates a trap, since it attempts to assert that the content of a 
register Is not equal to itself (the register number used here is irrelevant, as long as 
the register is othenwise accessible). 

The System_Routine vector number specified by the instruction Invokes the execution 
of the operating system routine that provides the requested service. This vector num- 
ber may have any value between 64 and 255, inclusive (vector numbers through 63 
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are pre-defined or reserved). Thus, as many as 192 different operating-system rou- 
tines may be invol<ed from the applications program. 

In cases where the indirect pointers may be used, the EMULATE instruction allows 
two operand/result registers to be specified to the operating-system routine. The 
instruction is: 

emulate System_Routine, Ir3, Ir6 

In this case, the System_Routine vector number performs the same function as in the 
previous example. Here, however, LR3 and LR6 are specified as operand registers 
and/or result registers (these particular registers are used only for illustration). The 
operating-system routine has access to these registers via the indirect pointers, which 
allows flexible communication. 

7.2.4 l/lulti-Precision Integer Addition And Subtraction 

The processor allows the Carry (C) bit of the ALU Status Register to be used as an 
operand for add and subtract instructions. This provides for the addition and subtrac- 
tion of operands which are greater than 32 bits in length. For example, the following 
code implements a 96-bit addition with signed overflow detection. 

add Ir7, gr96, Ir2 

addc IrS, gr97, IrS 

addcs Ir9, gr98, Ir4 

Global registers GR96-GR98 contain the first operand, local registers LR2-LR4 
contain the second operand, and local registers LR7-LR9 contain the result. The first 
two add instructions set the C bit, which is used by the second two instructions. If the 
addition causes a signed overflow, then an Out of Range trap occurs; overflow is 
detected by the final instruction. 

7.2.5 Integer Multiplication 

The Am29050 microprocessor directly executes the integer-multiplication instructions 
MULTIPLY, MULTIPLU, MULTM, and MULTMU (these are implemented using traps 
in the Am29000 microprocessor). The Am29050 microprocessor implements the 
multiply-step instructions MUL, MULU, and MULL for compatibility, but new code 
generated for the Am29050 microprocessor should take advantage of the faster 
integer multiply instructions. 

The MULTIPLY and MULTIPLU instructions multiply two 32-bit integers, giving a 
32-bit result. MULTIPLY is used for signed Integers, and mOltIPLU is used for un- 
signed integers. Overflow of the 32-bit result is detected when Integer Multiplication 
Overflow Exception Mask bit (MO) of the Integer Environment Register is 0. when the 
MO bit is 0, the MULTIPLY and MULTIPLU operations cause an Out of Range trap 
upon overflow of a 32-bit signed or unsigned result, respectively. 

In general, multiplying 32-bit integers produces a 64-bit result. The most-significant 32 
bits of a signed or unsigned result are generated by the MULTM and MULTMU in- 
structions, respectively. To obtain a full 64-bit result, a MULTIPLY or MULTIPLU 
instruction is followed by a MULTM or MULTMU instruction: 

; 32 bit * 32 bit -» 64 bit signed multiply 
; Input: multiplicand in Ir2, multiplier in ir3 
; Output: result most-significant word 
; in gr96, result 
; least-significant word in gr97 
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multiply 
multm 



gr97. Ir2, Ir3 
gr96, Ir2, Ir3 



; get Isb's 
; get msb's 



muitiplu 
multmu 



gr97, Ir2, Ir3 
gr96, Ir2, Ir3 



32 bit* 32 bit ^64 bit unsigned multiply 
Input: multiplicand in Ir2, multiplier in Ir3 
Output: result most-significant word in 
gr96, result 
least-significant word in gr97 

get Isb's 
get msb's 



The operation producing the most-significant bits of the 64-blt result is fully pipelined 
with the operation producing the least-significant bits, so generating a full, 64-bit result 
takes one more cycle than generating a 32-bit result. Note that the MO bit should be 1 
to disable the detection of overflow when obtaining a 64-blt result; 64-bit results can- 
not overflow. 



7.2.6 Integer Division 

The processor performs integer division by a series of divide step instructions, rather 
than by a single instruction. Floating-point division Is performed by hardware. When 
the divisor is a power of 2, and the dividend is unsigned, the divide should be accom- 
plished by a right shift. 

If a program requires the division of two Integers, the required sequence of divide 
steps may be executed in-line, or executed In a divide routine called as a procedure. 
It may be beneficial to precede a full divide procedure with a routine to discover 
whether or not the number of divide steps may be reduced. This reduction is possible 
when the operands do not use all of the available 32 bits of precision. 

The following routine divides a 64-blt, unsigned dividend by a 32-blt unsigned divisor: 

64 bit / 32 bit -> 32 bit unsigned divide 

Input: most-significant dividend word In Ir2, least-significant dividend word in Ir3, 

divisor in Ir4 
Output: quotient In gr96, remainder in gr97 

UDiv64: 

mtsr Q, Ir3 ; put least-significant word of the dividend in 

the O ; register 

divO gr97, Ir2 ; perform initial divide step 



.rep 

div 
.endr 



31 

gr97, gr97, Ir4 



expand out 31 copies of the next 

instruction in-line 

total of 30 more divide steps 



divl gr97, gr97, Ir4 ; perform last step 

divrem gr97, gr97, Ir4 ; compute remainder 

mfsr gr96, Q ; get the quotient 

The following routine divides a 32-blt unsigned dividend by a 32-bit unsigned divisor: 

32 bit / 32 bit -> 32 bit unsigned divide 
Input: dividend word in Ir2, divisor in Ir3 
Output: quotient In gr96, remainder in gr97 
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UDiv32: 



put the dividend in the Q register 
perform initial divide step, zeroing out 
the upper bits of the dividend 

expand out 31 copies of the next 

instruction in-line 

total of 30 more divide steps 



; perform last step 
; compute remainder 
; get the quotient 

The following routine divides a 32-blt signed dividend by a 32-bit signed divisor. It also 
traps division by zero. Because the divide-step Instructions only operate on unsigned 
operands, extra code is required to perform sign checking and conversion: 



mtsr 


Q, Ir2 


divO 


gr97,0 


.rep 


31 


div 


gr97, gr97, Ir4 


.endr 




divl 


gr97, gr97, Ir4 


divrem 


gr97, gr97, Ir4 


mfsr 


gr96,Q 



; 32 bit / 32 bit signed divide, called by: 



call the divide routine 



call tpc, SDIv32 

setip dst_reg, src1_reg, src2_reg 

; passing pointers to the operand 
; registers in the delay slot 
Input: dividend and divisor in the registers pointed to by the indirect-pointer 

registers I PA and IPB 
Output: result quotient in the register pointed to by IPC, remainder left in TempO 
Used: return address in tpc, special register Q 
Destroyed: previous contents of registers tav. Tempo -Temp2 
Symbolic register names: 

.reg TempO, gri 16 

.reg Tempi, gri 19 

.reg Temp2, gr120 

.reg tpc, gri 22 

.word 0x00200000 ; Debugger tag word 

SDiv32: 



pdividend: 



const 


Tempi , 




asneq 


V_DIVBYZERO, Tempi . grO 






; check for divide by zero with an assert 


add 


Tempo, grO, 


; get dividend from Indirect pointer 


jmpf 


Tempo, pdividend 


; Is it negative (jmpf Is also "jmppos") 


add 


Temp2, Tempi, grO 


; get divisor from indirect pointer 


const 


Tempi , 3 


; set negative result and remainder flags 


subr 


Tempo, Tempo, 


; make dividend positive 


jmpf 


Temp2, pdivisor 


; is divisor negative? 


mtsr 


Q, Tempo 


; copy dividend to register in delay slot 
; of the jump 


xor 


Tempi , Tempi , 1 


; turn off negative result flag 


subr 


Temp2, Temp2, 


; make divisor positive 



pdivisor: 



divO Tempo, 



.rep 

div 
.endr 



31 



Tempo, Tempo, Temp2 



initialize 

expand out 31 copies of the next 

instruction in-line 

total of 30 more divide steps 



divl Tempo, TempO, Temp2 

divrem TempO, TempO, Temp2 



perform last divide step 
get positive remainder 
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7.2.7 



7.2.8 



mfsr 


Temp2, Q 


; get positive quotient 


sll 


Tempi , Tempi , 30 


; copy negative remainder flag to test bit 


jmpf 


Tempi , premainder 


; if it is not set, remainder Is ok 


sll 


Tempi, Tempi, 1 


; copy negative result flag to test bit 


subr 


Tempo, Tempo, 


; negate remainder 


premainder: 






jmpfi 


Tempi , tpc 


; return to caller if result is positive 


add 


grO, Temp2, 


; copying quotient to the result register 
; in the delay slot 


jmpi 


tpc 


; else return to caller, 


subr 


grO, Temp2, 


; negating the quotient in the delay slot 


Rounding 







Floating-point operations can be performed in one of four rounding modes defined in 
the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std. 754-1985). 
These modes are: 

Round to Nearest The result produced is the representable value nearest to the 
infinitely precise result. It can happen that the infinitely precise result falls exactly 
halfway between two representable values; In this case, the result produced will be 
whichever of those two representable values has a fractional part whose least-signifi- 
cant bit is 0. 

Round Toward +°o: The result produced is the representable value closest to but no 
less than the Infinitely precise result. 

Round Toward -oo: The result produced is the representable value closest to but no 
greater than the infinitely precise result. 

Round Toward 0: The result produced is the representable value closest to but no 
greater in magnitude than the infinitely precise result. 

The floating-point rounding mode is determined by the FRM field of the Floating-Point 
Environment Register. The following operations are affected by the value In the FRM 
field: 

• FADD, DADD, FSUB. DSUB, FMUL, DMUL, FDIV, DDIV, 
FMAC, DMAC, FMSM. DMSM, and SQRT 

• MFACC and MTACC 

• CONVERT, when the Instruction field RND is 100. 

The value In the FRM field has no effect on the floating-point comparison operations, 
the CLASS operation, or the FDMUL operation. 

Fast-Float Mode 

The 29K Family fully supports the IEEE Standard for Binary Floating-Point Arithmetic 
(ANSI/IEEE Std. 754-1985). For some floating-point implementations, however, a 
significant speed advantage can be realized by disabling certain supported features. 

For the Am29050 microprocessor, a fast-float mode has been provided to disable the 
processing of denormalized numbers. Although the handling of denormalized num- 
bers in the Am29050 microprocessor is always transparent to the user, the processor 
will sometimes require extra cycles to process denormalized operands. This adds 
both to processing time and to the statistical variability of the processing time required 
for a given number of computations. 
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The Fast-Float mode is enabled by setting the Fast-Float Select (FF) bit of the Float- 
ing-Point Environment Register. In the fast-float mode, denormalized numbers are 
handled as follows: 

1 . A denormalized source operand is converted to a zero of the same sign before the 
operation is performed; this conversion does not affect the value of the operand In 
the source register. This conversion does not signal an inexact result exception, 
because the Fast-Float mode considers a denormalized number to be nothing 
more than a representation of zero. This occurs without adding extra cycles. 

2. An operation producing an Infinitely precise result smaller than the smallest 
normal number in the destination format will produce a zero result of the same 
sign as the infinitely precise result; the underflow and inexact exceptions will be 
reported. 

The Instructions MTACC, FMAC, DMAC, FMSM, and DMSM use the Fast-Float 
mode, regardless of the FF bit. 

7.2.9 Complementing a Boolean 

To complement a Boolean in the processor's format, only the most-significant bit of 
the Boolean word should be considered, since the least-significant 31 bits may or may 
not be zeros. This is accomplished by the following instruction: 

cpge gr96, gr96, 

The Boolean is in GR96 in this example. This instruction is based on the observation 
that a Boolean TRUE is a negative integer, since the Boolean bit coincides with the 
integer sign bit. If the operand of this instruction is a negative integer (i.e., TRUE), the 
result is the Boolean FALSE. If the operand is non-negative (i.e., the Boolean 
FALSE), the result is TRUE. 

7.2.10 Using the Floating-Point Accumulators 

The Floating-Point Accumulators (ACCO to ACC3) provide an extra source or destina- 
tion register for the multiply-accumulate (FMAC, DMAC) and multiply-sum (FMSM, 
DMSM) instructions. The FMAC and DMAC instructions can be used to evaluate 
sum-of-products calculations, such as those found in vector or matrix multiplication. 
The FMSM and DMSM instructions are used when the multiplier is a fixed value, such 
as in polynomial evaluation using Horner's Rule, or the SAXPY or DAXPY (Single/ 
Double precision A times X Plus Y) vector routines used in Gaussian Elimination. 

7.2.10.1 MATRIX MULTIPLICATION USING THE FMAC INSTRUCTION 

One of the operations performed frequently in 3-dimensional (3-D) graphics systems 
is the rotation and translation of a 3-D vector. This Is accomplished by multiplying a 
4-by-1 vector and a 4-by-4 matrix. In this case, the four accumulators are used to 
interleave four Independent sum-of-products calculations. This eliminates pipeline 
stalls caused by dependencies on the accumulator values. 

For the FMAC and DMAC instructions, accumulated values can overflow, especially 
when accumulating many terms. The FMAC and DMAC instructions can specify the 
accumulator format independent of the other operands, allowing the accumulated 
values to be maintained in the double-precision format even though the operations 
are performed in the single-precision format. This is accomplished with no perform- 
ance penalty. 
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Multiply a 4x 1 vector times a 4x4 matrix. Four accumulators are 

used to interleave four Independent sum-of -product evaluations. 

This code takes 22 cycles to complete 28 floating-point 

operations. 

Input: 4x4 matrix (a) in registers Ir2 - Ir1 7; 4 x 1 vector 

(b) in registers Ir18~lr21 
Output: 4 X 1 result vector (c) in Ir22 - Ir25. 

The first four instructions initialize the accumulators with 
the first four independent products. The FMAC function field 
is set to 4, specifying that the operation to be performed is 
a*b + 0.0 

fmac 4, 0, Ir2, Ir18 ; accO <- a1 1 * b1 

fmac 4, 1 . Ire, Ir18 ; acd <- a21 * b1 

fmac 4. 2, in 0, in 8 ; acc2 <r- a31 * b1 

fmac 4, 3, in 4, Ir1 8 ; acc3 «- a41 * b1 



the remaining FMAC operations continue the four independent evaluations: 
fmac 0, 0, Ir3, Ir1 9 ; accO <- a1 2 * b2 + accO 

fmac 0, 1 , Ir7, Ir1 9 ; acd <- a22 * b2 + acd 

fmac 0, 2, Ir11 , Ir1 9 ; acc2 <- a32 * b2 + acc2 

fmac 0, 3, Ir15, Ir1 9 ; acc3 <r- a42 * b2 + acc3 



fmac 


0, 0, Ir4, Ir20 


; acc0^a13*b3 + acc0 


fmac 


0, 1 , Ir8, Ir20 


; acd f- a23 * b3 + acd 


fmac 


0,2, in2, Ir20 


; acc2 <- a33 * b3 + acc2 


fmac 


0.3,in6. Ir20 


; acc3 <- a43 * b3 + acc3 


fmac 


0. 0, Ir5, Ir21 


; accO<-a14*b4 + accO 


fmac 


0, 1,lr9,lr21 


; acd <- a24 * b4 + acd 


fmac 


0, 2, in3, Ir21 


; acc2 <- a34 * b4 + acc2 


fmac 


0.3, in?, Ir21 


; acc3 <- a44 * b4 + acc3 



; the final four instructions move the accumulated sums into 
; the destination registers: 

mfacc Ir22, 1,0 ; cO f- accO 

mfacc Ir23, 1,1 ; d <- acd 

mfacc Ir24, 1,2 ; c2 f- acc2 

mfacc Ir25, 1.3 ; c3 <- acc3 

7.2.10.2 SAXPY USING THE MSM INSTRUCTION 

The SAXPY (Single Precision A Times X Plus Y) routine is used heavily to solve 
systems of linear equations via Gaussian Elimination. The following example SAXPY 
routine operates on vectors of 1 6 elements: 

SAXPY of size 1 6, using the FMSM instmction. 
inputs: constant multiplier A in Ir2 

address of X vector in Ir3 

address of Y vector in Ir4 

address of result vector in Ir5 

assumes ACF is 01 



: first, load in the X vector using the LOADM instruction. This 
: operation works with burst-access memory at 1 word per cycle: 
mtsrim cr, 15 ; load 16 words 

loadm 0, 0, gr96, Ir2 ; read in the X vector 



load in the Y vector the same way... 
mtsrim cr, 15 
loadm 0, 0, Ir6, Ir3 
mtacc Ir4, 0, 



read in the Y vector 
initialize with multiplier A 
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; perform 16 FMSM instructions on the two vectors 

f msm gr96, gr96, Ir6 ; gr96 = gr96 * ACCO + Ir6 

f msm gr97, gr97, Ir7 ; gr97 = gr96 * ACCO + Ir7 

f msm gr98, gr98, Ir8 ; gr98 = gr98 * ACCO + Ir8 



fmsm gr111,gr111,lr21 ; gr111=gr111*ACC0 + lr21 

; store out the result vector 
mtsrim cr, 15 
storem 0, 0, gr96, Ir5 

7.2.1 1 Using the Condition Code Accumulator 

The Condition Code Accunnulator can be used to concatenate the Boolean results of 
several operations into a single condition code. The condition code can then be used 
as an operand in further operations, for example, as a control parameter for condi- 
tional branches. 

The Condition Code Accumulator Register is accessed via Global Registers 2 and 3. 
If Global Register 2 (CCA) is specified as the destination of an operation, then the 
32-bit operation result is written to the Condition Code Accumulator Register. If Global 
Register 3 (CCA-shift) is the destination, then the Condition Code Accumulator Regis- 
ter is shifted left one bit and the most-significant bit of the operation result is placed in 
the least-significant bit of the register. The contents of the Condition Code Accumula- 
tor Register are read by specifying Global Register 2 as a source operand of an 
instruction. 

The following restrictions apply to the use of the Condition Code Accumulator: 

CCA as Source: The CCA register can be specified as a source for any instruction 
except those performed in the Floating-Point Unit. (The instructions performed in the 
FPU are: all floating-point instructions, CLASS, CONVERT, MULTIPLY, MULTIPLU, 
MULTM, and MULTMU.) 

CCA as Destination: The CCA register can be specified as the destination of the 
following instructions only: ADD, SUB, and the constant instructions (CONST, 
CONSTH, CONSTHZ, and CONSTN). 

CCA-sliift as Source: The CCA-shift register can not be specified as a source. 
Specifying CCA-shift as a source will produce an unpredictable result. 

CCA-sliift as Destination: The CCA-shift register can be specified as the destina- 
tion of any instruction excepf LOAD, LOADL, LOADM, and LOADSET. 

There are two additional restrictions on the use of the Condition Code Accumulator: 

1. The Condition Code Accumulator cannot be used as both source and destination 
in the same instruction. For example, the Instructions: 

add gr3,gr2,lr0 
or 

add gr2,gr3,lr0 

are not permitted. 



7-24 PROGRAMMING 



2. Write-write dependency checking is disabled for any instructions having CCA or 
CCA-shift as the destination. For example, if the instructions: 

fdiv gr3,lr0,!r2 
and 

fmul gr3,lr4,lr6 

are issued in sequence, hardware interlocks do nof guarantee that the instructions 
will complete in sequence. Therefore only code sequences which guarantee a 
fixed order of completion will give predictable results. Problematic sequences are 
those which contain: 

• Instructions with unequal latencies (as in the example above). 

• Instructions whose latency may change in the presence of denormalized input 
operands or results. These instructions — ^which include FMUL, DMUL, DDIV, and 
SORT — can be used if the Fast-Float mode is enabled. 

7.2.12 Generating Large Constants 

Eight-bit constants are directly available to most instructions. Larger constants must 
be generated explicitly by instructions and placed into registers before they can be 
used as operands. The processor has four instructions for the generation of large 
data constants: Constant (CONST); Constant, High (CONSTH); Constant, Negative 
(CONSTN); and Constant High, Zero (CONSTHZ). 

The CONST instruction sets the least-significant 16 bits of a register with a field In the 
Instruction; the most-significant 16 bits are set to zero. This Instruction allows a 32-bit 
positive constant to be generated with one instruction, when the constant lies in the 
range of to 65535. 

Any 32-bit constant may be generated with a combination of the CONST and 
CONSTH Instructions. The CONSTH instruction sets the most-significant 16 bits of a 
register with a field in the Instruction; the least-significant bits are not modified. Thus, 
to create a 32-blt constant in a register, the CONST instruction sets the least-signifi- 
cant 16 bits, and the CONSTH instruction sets the most-significant 16 bits. 

The CONSTN instruction sets the least-significant 16 bits of a register with a field In 
the instruction; the most-significant 16 bits are set to one. This instruction allows a 
32-bit, negative constant to be generated with one instruction, when the constant lies 
in the range of -65536 to -1 . 

The CONSTHZ instruction sets the most-significant 1 6 bits of a register with a field In 
the instruction; the least-significant 16 bits are set to zero. This facilitates the genera- 
tion of floating-point constants. 

7.2.13 Large Jump and Call Ranges 

The 16-blt relative branch displacement provided by processor Instructions is suffi- 
cient in the majority of cases. However, addresses with a greater range occasionally 
are needed. In these cases, the CONST and CONSTH instructions generate the large 
branch-target address In a register. An Indirect jump or call then uses this address to 
branch to the appropriate location. 

When program modules are compiled separately, the compiler cannot determine 
whether or not the 16-blt displacement of a CALL Instruction Is sufficient to reach 
an external procedure, even though it Is sufficient In most cases. Instead of generat- 
ing instructions for the worst case (i.e., the CONST, CONSTH, and CALLI described 
above), it is more efficient to generate a CALL as if it were appropriate, with the 
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worst-case sequence (in this case, CONST, CONSTH, and JI\/IPI) also appearing in 
the generated code somewhere (e.g., at the end of a compiled procedure). 

When the above scheme Is used, the linker is able to determine whether or not the 
CALL is sufficient. If it is not, the CALL can be re-targeted to the worst-case sequence 
in the code. In other words, when the CALL is not sufficient, the linker causes the 
execution sequence to be: 

- call 



^ const 
consth 
jmpi 

In this manner, the longer execution time for the call occurs only when necessary. 

7.2.14 NO-OPs 

When a NO-OP is required for proper operation (e.g., as described in Section 7.4.3), it 
is important that the selected Instruction not perform any operation, regardless of 
program operating conditions. For example, the NO-OP cannot access general- 
purpose registers, because a register may be protected from access in some situ- 
ations. The suggested NO-OP Is: 

aseq 0x40, gr1 , gr1 

This instruction asserts that the Stack Pointer (GR1) is equal to itself. Since the asser- 
tion is always true, there is no trap. Note also that the Stack Pointer cannot be pro- 
tected, and that the assert instruction cannot affect any processor state. 

7.2.15 Character-String Operations 

The need to perform operations on character strings arises frequently in many sys- 
tems. Th^processor provides operations for manipulating character data, but these 
are frequently inefficient for dealing with character strings, since the processor is 
optimized for 32-blt data quantities. 

It is much more efficient, in general, to perform character-string operations by operat- 
ing on units of four bytes each. These four-byte units are more suited to the proces- 
sor's data-flow organization. However, there are several things to be considered when 
dealing with four-byte units, as outlined in this section. 

7.2.15.1 ALIGNMENT OF BYTES WETHIN WORDS 

Character strings normally are not aligned with respect to 32-bit words. Thus, when 
word operations are used to perform character-string operations, alignment of the 
character strings must be taken into account. 

For example, consider a character string aligned on the third byte of a word that is 
moved to a destination string aligned on the first byte of a word. If the movement is 
performed word-at-a-time, rather than byte-at-a-time, the move must involve shift and 
merge operations, since words in the destination character-string are split across 
word boundaries in the source character string. 

The processor's Funnel Shifter can be used to perform the alignment operations 
required when character operations are performed in four-byte units. Though the 
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Funnel Shifter supports general bit-aligned shift and merge operations, it easily is 
adapted to byte-aligned operations. 

For byte-aligned shift and merge operations, it is only necessary to insure that the two 
most-significant bits of the Funnel Shift Count (FC) field of the ALU Status Register 
point to a byte within a word, and that the three least-significant bits of the FC field 
are 000. 

7.2.15.2 DETECTION OF CHARACTERS WITHIN WORDS 

Most character-string operations require the detection of a particular character within 
the string. For example, the end of a character string Is identified by a special charac- 
ter in some character-string representations. In addition, character strings often are 
searched for a specific pattern. During such searches, the most-frequently executed 
operation is the search within the character string for the first character of the pattern. 

The processor provides a Compare Bytes (CPBYTE) instruction, which directly sup- 
ports the search for a character within a word. This instruction can provide a factor-of- 
four performance increase in character-search operations, since it allows a character 
string to be searched in four-byte units. 

During the search, the words containing the character string are compared, a word at 
a time, to a search key. The search key has the character of interest in every byte 
position. The CPBYTE instruction then gives a result of TRUE If any character within 
the character-string word matches the corresponding byte in the search key. 

7.2.16 Movement off Large Data Blocks 

The movement of large blocks of data— for example, to perform a memory-to-memory 
move — can be performed by an alternating series of loads and stores. However, It is 
normally much more efficient to move large blocks of data by using an alternating 
series of Load Multiple and Store Multiple instructions. These instructions take better 
advantage of the data-movement capabilities of the processor, though they require 
the use of a large number of registers. 

During data movement, it is possible to perform alignment operations by a series of 
EXTRACT Instructions between the Load Multiple and Store Multiple. Also, since the 
Load Multiple and Store Multiple are interruptible, these instructions may be used to 
move large amounts of data without affecting interrupt latency. 

7.3 SYSTEMS-PROGRAMMING CONSIDERATIONS 

This section discusses topics of general concern In the implementation of control 
programs and operating systems. 

7.3.1 System Protection 

The Am29050 microprocessor provides protection of several (different system re- 
sources. In general, this protection is based on the value of the Supervisor Mode 
(SM) bit in the Current Processor Status Register. 

7.3.1.1 MEMORY PROTECTION 

Memory and input/output access protection is provided by the Memory Management 
Unit. Each Translation Look-Aside Buffer entry in the MMU contains protection bits 
which determine whether or not an access to the page associated with the entry will 
be permitted. Each Region Mapping Control Register also contains protection bits to 
control access to the virtual region it maps. 
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There is a set of protection bits for Supervisor-mode programs, and a separate set for 
User-mode programs. Tiius, for the same virtual page or region, the access authority 
of programs executing in the Supervisor mode can be different than the authority of 
programs executing In User mode. 

A Data MMU Protection Violation or Instruction MMU Protection Violation trap occurs 
If a data or Instruction access, respectively, Is attempted, but is not allowed because 
of the value of the protection bits. 

7.3.1.2 REGISTER PROTECTION 

General-purpose registers are protected by the Register Bank Protection Register. 
The Register Bank Protection Register allows parameters for the operating system 
to be kept In general-purpose registers, protected from corruption by User-mode 
programs. 

If a User-mode program attempts to access a protected general-purpose register, a 
Protection Violation trap occurs. Supervisor-mode programs may access any general- 
purpose register, regardless of protection. 

The special-purpose registers to 127 and all Translation Look-Aside Buffer registers 
are protected from User-mode access. Any attempted access of these registers by a 
User-mode program causes a Protection Violation trap. The special-purpose registers 
163 and 165 to 255 (though not implemented) are protected from any access. Any 
attempted access of special-purpose registers 163 and 165 to 255, even in the Super- 
visor mode, causes a Protection Violation trap. This permits vlrtualization of these 
special registers. 

7.3. 1 .3 EXTERNAL ACCESS PROTECTION 

Other than the protection offered by the Memory Management Unit, the processor 
provides no specific protection for external devices and memories. However, the 
SUP/US output reflects the value of the SM bit during the address cycle of an external 
access. This can signal external devices an d memo ries to provide protection. Any 
protection violations can be reported via the DERR Input. 

7.3.2 interrupts and Traps 

The Am29050 microprocessor automatically saves only the Current Processor Status 
Register in the Old Processor Status Register when an interrupt or trap Is taken. The 
processor does not automatically save any other state when an interrupt or trap Is 
taken, but rather freezes the contents of the following registers: 

1 . Program Counters 0, 1 , and 2. 

2. Channel Address, Channel Data, and Channel Control. 

3. ALU Status. 

When these registers are frozen, they are allowed to be updated only by Move To 
Special Register Instructions. The frozen condition Is controlled directly by the Freeze 
(FZ) bit In the Current Processor Status Register. 

Since the Channel Address, Channel Data, and Channel Control registers are frozen 
when an interrupt or trap Is taken, the interrupt handler may perform single-access 
loads and stores without interfering with the restart state of a channel operation In the 
interrupted routine. However, load-multiple and store-multiple operations have unpre- 
dictable results if performed while the FZ bit Is 1 , since these operations are se- 
quenced by the Channel Control Register. 
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7.3.2.1 VECTOR AREA 



As discussed in Section 3.5.4, interrupts and traps are dispatched tiirougli a 
256-entry Vector Area, which directs the processor to a routine to handle a given 
interrupt or trap. Only 64 entries of this area are required for basic processor opera- 
tion (or 22, if instruction emulation is not used). 

The required number of Vector Area entries is system-dependent, as determined by 
the vector numbers that are specified in the assert and EMULATE instructions. The 
number of entries can be restricted to reduce the memory requirements for the Vector 
Area, which is especially Important when the Vector Area is organized as a sequence 
of 64-lnstruction blocks. However, there Is nothing to prevent an instruction from 
specifying a vector number In the range 64 to 255. For this reason. It may not be 
possible to reduce the size of the Vector Area, since erroneous Instruction vector 
numbers might cause unpredictable results. 

The Vector Area may be relocated by the Vector Area Base Address Register, and 
there may be multiple Vector Areas In the system, with the Vector Area Base Address 
Register pointing to the one that is currently active. 



7.3.2.2 INTERRUPT HANDLING 



For temporary program Interruptions, such as for Translation Look-Aside Buffer 
reload, the basic processor interrupt mechanism Is sufficient to eliminate the need for 
the interrupt or trap handler to save any state for the interrupted routine. This state 
may be left in the appropriate registers while the handler executes. An Interrupt return 
returns Immediately to the Interrupted program. 

Besides the direct performance advantage that results from not saving state for tem- 
porary program interruptions, there is an additional advantage provided by the proc- 
essor. When the state of the interrupted routine remains in the appropriate registers, 
the processor can detect that the Program Counter and Program Counter 1 regis- 
ters contain sequential addresses. Instead of performing two non-sequential instruc- 
tion fetches for the Interrupt return in this case, the processor initiates only a single 
non-sequential fetch (the second fetch is performed as a sequential fetch). This re- 
duces the overhead of the interrupt return for these routines. 

Note that when the state of an Interrupted program remains In the processor, the 
processor cannot be enabled to take any further interrupts until an interrupt return is 
executed. Therefore, this capability should be restricted to time-critical routines, 
where the execution time of the routine does not Interfere with interrupt-latency con- 
siderations. (Note that the Interrupt Pending bit of the Current Processor Status Reg- 
ister may be used to detect the presence of external Interrupts while these interrupts 
are disabled). 

To support dynamically nested Interrupts and traps, the interrupt or trap handler must 
save state as necessary for the application, using an appropriate data structure (such 
as an interrupt stack or program status area). Once the state has been saved (or, 
alternately, while it Is being saved), the handler can load the state for a new program 
to be executed. An Interrupt return then Initiates the execution of the new program. 

When the interrupt or trap handler saves the floating-point accumulators, the Accumu- 
lator Format (ACF) field of the Floating-Point Environment Register may not Indicate 
the actual format of the accumulators, because of modifications to the ACF field be- 
fore the interrupt or trap was taken. The Interrupt or trap handler should treat the 
accumulators as containing double-precision values. This requires forcing the ACF 
field to 10 (double-precision) after saving the Floating-Point Environment Register and 
before executing an MFACC instruction to save the accumulators. 
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7.3.2.3 INTERRUPT RETURN 

An interrupt return resumes the execution of a program whose processor state is 
contained in the following registers: 

1 . Old Processor Status. 

2. Program Counters and 1 . 

3. Channel Address, Channel Data, and Channel Control. 

This state is most likely different from the state of the program executing the interrupt 
return. These registers must be set appropriately before an interrupt return Is 
executed. 

Note that the instruction sequence that sets these registers must have a Current 
Processor Status that is equivalent to that of an interrupt or trap handler; the FZ bit 
must be 1 , and interrupts and traps must be disabled. 

7.3.2.4 SIMULATION OF INTERRUPTS AND TRAPS 

Assert instructions may be used by a Supervisor-mode program to simulate the oc- 
currence of various interrupts and traps defined for the processor. Only an assert 
instruction executed in Supervisor mode can specify a vector number between and 
63. If this instruction causes a trap, the effect is to create an interrupt or trap which Is 
similar to that associated with the specified vector number. 

Thus, the interrupt and trap routines defined for basic processor operation c an be 
invoked without creating any particular hardware condition. For example, an INTR1 
Interrupt may be simulated by an assert instruction that specifies a vector number 
of 17, without the activation of the INTR1 signal. 

7.3.2.5 TRAPS IN SYSTERi-LEVEL ROUTINES 

The Monitor trap and Monitor mode provide a mechanism for handling traps In 
system-level routines in a manner that allows these routines to be restarted. This 
permits error recovery and debugging of system-level routines. 

7.3.3 Memory Management 

This section discusses various Issues Involved in memory management as they relate 
to an operating system. The focus is on virtual-addressing issues. 

7.3.3.1 VIRTUAL PAGE SIZE 

The MMU Configuration Register determines the size of a virtual page mapped by the 
Memory Management Unit. The choices for page size are 1,2,4, and 8 kb. The se- 
lection of page size is based on several considerations: 

1 . For a given page size, any allocation of pages to a process will, on average, 
waste half of one page. With smaller page sizes, the waste is smaller. In systems 
with a large number of processes, each with a small amount of memory, small 
page sizes can reduce waste significantly. 

2. Smaller page sizes allow finer memory-protection granularity. 

3. The maximum amount of memory that can be referenced by Translation 
Look-Aside Buffer (TLB) entries is set by the number of TLB entries and the page 
size. Larger page sizes allow the fixed number of TLB entries to address more 
memory, and generally reduce the number of TLB misses. For example, with 1-kb 
pages, a process requiring 8 kb of contiguous memory would create eight TLB 
misses; with 8-kb pages, the process would create only one TLB miss. 
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4. The page is usually the unit of memory moved between memory and backing 
storage. The design of the backing storage sub-system also may influence the 
choice of page size, because of transfer-efficiency considerations. For example, if 
the backing storage is a disk, the disk seek time is large compared to transfer 
time. Thus, it is more efficient to transfer large amounts of data with a single seek. 
Efficiency may also depend on disk organization (i.e., the number of seeks 
possibly required to transfer a page). 

7.3.3.2 PAGE REFERENCE AND CHANGE INFORMATION 

In a demand-paged environment, It is Important to be able to collect Information on 
the use and modification of pages. The processor does not collect this information 
directly, but the Information may be collected by the operating system, without requir- 
ing hardware support. 

Each TLB entry contains six bits which specify the type of accesses that are permitted 
for the corresponding page. When a TLB entry is loaded, the TLB reload routine can 
set the protection bits so that an access to the corresponding page is not allowed. If 
an access is attempted, a TLB protection violation traps occurs. This trap may be 
used to signal that the page is being referenced. After noting this fact, the trap handler 
may set the protection bits to allow the access, and return to the trapping routine. 

A technique similar to the one just described can be used to collect information on the 
modification of a page. However, in this case, the TLB protection bits initially are set 
so that a store is not allowed. 

It is also possible to create reference information by noting references during TLB 
reload. For example, reference bits normally are reset periodically, so that they reflect 
current references. When reference bits are reset, the entire TLB may be invalidated. 
Reference bits then are set as TLB entries are loaded. Note that this scheme relies on 
the fact that a TLB miss implies a reference to the corresponding page. Also, this 
scheme does not account for page change information. 

The disadvantage of both of the above schemes is one of possible performance loss. 
This is the result of the additional traps required to monitor page references and 
changes. If the performance impact is unacceptable, references and changes can be 
monitored easily by hardware that detects reads and writes to page frames In instruc- 
tion or data memory. 

7.3.3.3 MONITORING CRITICAL AREAS OF MEMORY 

In certain fault-tolerant systems, it is necessary to detect changes to critical areas of 
memory, so that these changes may be reflected immediately on a non-volatile stor- 
age device. To monitor critical memory areas, the TLB protection bits can be set so 
that any change to the area causes a Data TLB Protection Violation trap. This trap 
signals that the area is being modified. 

In this use of the protection bits, the trap handler does not set the bits to allow the 
access. Rather, the trap handler must emulate the access, using the Channel Ad- 
dress, Channel Data, and Channel Control registers. The Contents Valid (CV) bit of 
the Channel Control Register is reset before the trapping routine Is restarted, so that 
the trap does not recur. 

7.3.3.4 TLB MISS HANDLING 

The address translation performed by the MMU is ultimately determined by routines 
that place entries Into the Translation Look- Aside Buffer (TLB). TLB entries normally 
are based on system page tables, which give the translation for a large number of 
pages. The TLB simply caches the currently-needed translations, so that system page 
tables do not have to be accessed for every translation. 
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If a required address translation cannot be performed by any entry In the TLB, a TLB 
miss trap occurs. The trap handling routine — called the TLB reload routine—accesses 
the system page tables to determine the required translation, and sets the appropriate 
TLB entry. Note that the access requiring this translation can be restarted by the 
interrupt return at the end of the TLB reload routine (see Section 7.3.4). 

A large number of different page-table organizations are possible. Since the TLB 
reload routine is a sequence of processor instructions, the page tables may have a 
structure and access method that satisfies trade-offs of page table size, translation 
lookup time, and memory-allocation strategies. 

Another possibility supported by the TLB reload mechanism is that of a second-level 
TLB. The TLB reload routine is not required to access the system page tables Imme- 
diately upon a TLB miss, but may access an external TLB, which can be much larger 
than the processor's TLB. The amount of time required to access the external TLB 
normally is much smaller than the amount of time required to access the page tables, 
leading to an overall Improvement In performance. Of course, If a translation Is not in 
the external TLB, a page table lookup still must be performed. 

Because the TLB reload routine may depend on the type of access causing the TLB 
miss, the processor differentiates between misses on instruction and data accesses 
by Supervisor-mode and User-mode programs. This eliminates any time which might 
be spent by the TLB reload routine In making the same determination. Performance is 
also enhanced by the LRU Recommendation Register, which gives the TLB register- 
number for Word of the TLB entry to be replaced by the TLB reload routine (the 
least-recently-used entry). 

7.3.3.5 WARM START 

When a process switch occurs, there Is a high probability that most of the TLB entries 
of the old process will not be used by the new process. Thus, the new process most 
likely creates many TLB miss traps early in Its execution. This is unavoidable on the 
first initiation of a process, but may be prevented on subsequent initiations. 

When a given process Is suspended, the operating system can save a copy of its TLB 
contents. When the task Is restarted, the copy can be loaded back into the TLB. This 
warm start prevents many of the process' initial TLB misses, at the expense of the 
time required to save and restore the copy of the TLB entries. However, this time may 
be much shorter than the time required to perform all TLB re-loads individually. 

Note that If this warm-start strategy is adopted, any change In address translation 
must be reflected In all copies of TLB entries for all affected processes. If address 
translation Is changed often so that it affects more than one process, warm start may 
not be advantageous. 

7.3.3.6 MiNiMUM NUMBER OF RESIDENT PAGES 

In any processor that supports demand-paging, there is a minimum number of pages 
that must be resident for any active process. This minimum Is determined by the 
maximum number of pages that might be referenced by an atomic operation in the 
processor's architecture (e.g., an instruction, normally). If this maximum number Is not 
guaranteed to be resident In memory, some operations might never complete, since 
they may never have all of the required pages resident in memory at one time. 

For the Am29050 microprocessor, two pages are required for a process to make 
progress through the system. The reason for this requirement is that the Am29050 
microprocessor, on Interrupt return, restarts an interrupted Load Multiple or Store 
Multiple only after fetching two instructions (see Section 3.5.5). The first of these 
Instructions must be resident In memory — and mapped by the TLB — ^and the page 
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required to complete the Load Multiple or Store Multiple must also be resident—and 
mapped by the TLB— for the interrupt return to complete successfully. 

7.3.3.7 REGION MAPPING UNIT OPERATION 

The Region Mapping Units (RMUs) also perform translation from a virtual address to 
a physical address. Each of the two RMUs can map a region of contiguous virtual 
addresses to an equivalent-sized region of contiguous physical addresses. The region 
size can range from 64 kb to 2 Gb in power-of-two increments. The RMUs allow large 
blocks of contiguous physical memory to be mapped in the virtual address space 
without the overhead of TLB miss handling or the possibility of replacing required TLB 
entries. For example, operating-system kernels exhibit much less locality-of-reference 
than applications programs; an operating-system reference causing a TLB miss does 
not later use the same TLB entry as often as an application program. Using the TLB 
to map operating-system references can degrade performance and replace valid TLB 
entries of the calling application. By mapping the operating-system references with 
the RMUs, this overhead is eliminated. 

Like the TLB entries, each RMU entry has six bits which can be used to Implement 
protection as well as collect reference and change information. When both RMUs 
map a given virtual address, RMUO has priority over RMU1, and both have priority 
over the TLB entries. Upon an MMU Protection Violation trap, the trap handler (either 
data or instruction) should first check RMUO to see if that unit caused the exception. 
Following this, it should check RMU1 and finally the TLBs. 

If a valid translation does not exist In either RMUO or RMU1 , then the processor uses 
the TLB for translation. If no valid TLB translation exists, then a TLB miss trap occurs. 
The TLB miss handler may decide whether or not to use RMU instead of a TLB entry 
to handle the miss. 

7.3.3.8 BRANCH TARGET CACHE MEMORY CONSIDERATIONS 

The Branch Target Cache memory is accessed with virtual as well as physical ad- 
dresses, depending on whether address translation is enabled for instruction ac- 
cesses. Because of this, the Branch Target Cache memory may contain entries that 
might be considered valid, even though they are not. 

For example, address translation may be changed by modifying the Process Identifier 
of the MMU Configuration Register. This change is not reflected in the Branch Target 
Cache memory tags, so the tags do not necessarily perform valid comparisons. 

If a TLB miss occurs during the address translation for a branch target instruction, the 
processor considers the contents of the Branch Target Cache memory to be invalid. 
This is required to properly sequence the LRU Recommendation Register, and does 
not solve the problem just described. If the TLB is changed at some point, so that the 
TLB miss does not occur, the Branch Target Cache memory still may perform an 
invalid comparison. 

To avoid the above problem, the contents of the Branch Target Cache memory must 
be invalidated explicitly whenever address translation is changed. This can be accom- 
plished by executing an Invalidate (INV) Instruction whenever an address translation 
is changed. The INV instruction causes ail entries of the Branch Target Cache mem- 
ory to become Invalid (after the next successful branch). However, since the change 
in address translation rarely affects the program performing the change, the INV may 
unnecessarily affect the performance of this program. 
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The IRETINV instruction lias the same effect on the Branch Target Cache nnemory as 
the INV instruction, but can reduce the performance impact. The IRETINV delays 
invalidation until an interrupt return is executed, eliminating the need to disrupt an 
operating-system routine when it changes address translation. At the point of interrupt 
return, the contents of the Branch Target Cache memory are most likely not of much 
use anyway. 

Note that the Branch Target Cache memory is not invalidated when the Cache Dis- 
able (CD) bit of the Configuration Register is set. When the CD bit is 1 , the Branch 
Target Cache memory continues to operate, but the processor considers Its contents 
to be invalid. Thus, the CD bit cannot be used to invalidate the cache, and, further- 
more, the Branch Target Cache memory may have to be invalidated whenever the CD 
bit is to be reset (i.e., when the cache is to be enabled). 

The Branch Target Cache memory distinguishes between virtual and physical ad- 
dresses, between the instruction RAM and instruction read-only memory (ROM) 
address spaces, and between User-mode and Supervisor-mode addresses. Thus, the 
Branch Target Cache memory does not have to be invalidated on transitions between 
these address spaces. This improves the performance of applications that make 
heavy use of ROM-based and/or operating-system routines in either physical or vir- 
tual address space. 

7.3.4 Restarting Faulting External Accesses 

In a demand-paged system environment, virtual pages and their associated virtual-to- 
physical mappings are made available to programs on demand. In other words, the 
memory-management routines generally execute only when a given page or mapping 
is needed by a program. This need is signaled by a page fault trap caused by a pro- 
gram access (normally, the page fault occurs during a TLB reload). 

Since the page fault trap is part of normal system operation, and does not represent 
an error, the access that causes the trap must be restarted — once the trapping condi- 
tion is remedied— in a manner that is not detectable to the program causing the trap. 

Additionally, In the Am29050 microprocessor, the TLB reload mechanism relies on the 
ability to restart an access that causes a TLB miss trap. This restart also must be 
accomplished in a manner that cannot be detected by the trapping program. 

The Am29050 microprocessor overlaps external accesses with the execution of in- 
structions. Thus, traps caused by accesses are imprecise: the address of the instruc- 
tion that Initiated the access cannot be determined by the trap handler. Since the 
address of the initiating instruction is unknown, the access cannot be restarted by 
re-executing this instruction. Even if the address could be determined, the instruction 
might not be restartable, since an instruction executed before the trap occurred, but 
after the access began, may have altered the conditions of the access, such as by 
altering the address source register. 

In order to provide for the restarting of loads and stores that cause exceptions, the 
processor saves all information required to restart these accesses In the Channel 
Address, Channel Data, and Channel Control registers. The Contents Valid (CV) and 
Not Needed (NN) bits in the Channel Control Register indicate that the information 
contained in these registers represents an access that must be restarted. The CV bit 
Indicates that the access did not complete, and the NN bit indicates whether or not the 
data from the access is required by the processor. 

Note that since Instruction execution is overlapped with external accesses, an instruc- 
tion that executes after a load may alter the destination register for the load. If a trap 
occurs in this situation, the access Information in the Channel Address, Data, and 



Control registers is correct, but the load cannot be restarted. Tlie NN bit provides 
correct operation in this case. 

When an interrupt or trap is taken, the handling routine has access to the Channel 
Address, Data, and Control registers; the contents of these registers may contain 
Information relevant to an incomplete access, and can be preserved for restarting this 
access. Since these registers are frozen (due to the FZ bit of the Current Processor 
Status), they are not available to monitor any external accesses In the interrupt or trap 
handler until their contents are saved, and the FZ bit is reset. 

Please note that the exception handler for the Data Access Exception trap must clear 
the Transaction Faulted (TF) bit in the Channel Control Register. Failure to clear the 
TF bit will result in the Am29050 microprocessor taking the trap again, once the ex- 
ception handler returns, causing an infinite series of traps. 

The processor restarts an access, using the Channel Address, Channel Data, and 
Channel Control registers, upon an interrupt return (IRET or IRETINV). The access is 
initiated if the CV bit of the Channel Control Register is 1 and the NN bit is 0. The 
restart cannot be detected in the logical operation of the restarted routine, although 
the timing of its execution is altered. 

The mechanism used to restart faulting accesses has the additional benefit of allow- 
ing a fast interrupt-response time when the processor Is performing a load-multiple or 
store-multiple operation. Interrupted load-multiple and store-multiple operations are 
restarted as if they had faulted. In this case, the operation resumes from the point of 
interruption, not the beginning of the sequence. 

7.3.5 Multiple-Processor Systems 

The Am29050 microprocessor provides several facilities for the implementation of 
multi-programming and multi-processing systems. These facilities help provide mutual 
exclusion, synchronization, and communication between multiple processes, whether 
these processes execute on a single processor or multiple processors. 

Binary semaphores are supported by the Load and Set (LOADSET) instruction. This 
instruction loads the contents of an external location into a register and automatically 
sets the contents of the location to the integer -1 . This instruction requires no special 
hardware support in the system, since all sequencing is performed by the processor. 
Also, the LOADSET is available to User-mode programs. This eliminates the over- 
head of an operating-system call in the use of binary semaphores. 

The instructions Load and Lock (LOADL) and Store and Lock (STOREL) support the 
locking of external devices and memories, or the locking of particular locations within 
an external device or memory. This prevents access by any process or processor 
other than the one that performed the lock, and provides the flexibility of locking in a 
manner appropriate to the system and application. The LOADL and STOREL instruc- 
tions are available to User-mode programs. 

To ind icate that a LOADL or STOREL is being executed, the processor asserts the 
LOCK output during the external access. Since the processor cannot control the 
behavior of external devices and memories directly, system hardware must support 
locking, if required. 

Note that the protocol for the locking and unlocking of devices and memories must be 
defined by the system. For example, the protocol may be defined such that a LOADL 
locks the device or memory, and a STOREL unlocks the device or memory. Between 
the execution of the LOADL and the STOREL, the device can be accessed by the 
locking process with any combination of normal loads and stores. 
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For the implementation of a general-purpose exclusion, synchronization, and/or com- 
munication scheme, the processor allows Supervisor-mode pr ogram s to set the Lock 
(LK) bit in the Current Processor Status. This bit activates the LOCK pin, and prevents 
the processor from relinquishing the channel to another channel master. (If another 
master already has control of the channel when the LK bit is set, the LK bit does not 
take affect until control of the channel is returned to the processor.) 

The LK bit allows a Supervisor-mode program to execute with mutual exclusion for 
any sequence of instructions. However, because interrupts also must be disabled for 
true exclusion, this may have a negative impact on system performance if used im- 
properly. 

7.3.6 Timer Facility 

The processor has a built-in Timer Facility that can be configured to cause periodic 
interrupts. The Timer Facility consists of two special-purpose registers — the Timer 
Counter and the Timer Reload registers — that are accessible only to Supervisor-mode 
programs. These registers implement timing functions Independent of program 
execution. 

7.3.6.1 TIMER FACILITY OPERATION 

The Timer Counter Register has a 24-bit Timer Count Value (TCV) field that decre- 
ments by one on every processor cycle. If the TCV field decrements to zero, it is 
written with the Timer Reload Value (TRV) field of the Timer Reload Register on the 
next cycle; the Interrupt (IN) bit of the Timer Reload register is set at the same time. 
The re-loading of the TCV field by the TRV field maintains the accuracy of the Timer 
Facility. 

The Timer Reload Register contains the 24-bit TRV field and the control bits Overflow 
(OV), Interrupt (IN), and Interrupt Enable (IE). The TCV field and IN bit were de- 
scribed above. If the IN bit is 1 and the IE bit also 1 , a Timer Interrupt occurs. If the IN 
bit is 1 when the TCV field decrements to zero, the OV bit also is set. The OV bit 
Indicates that a Timer interrupt may have occurred before a previous interrupt was 
serviced. 

7.3.6.2 TIMER FACILITY INITIALIZATION 

To initialize the Timer Facility, the following steps should be taken in the specified 
order (it is assumed that Timer interrupts are disabled by the DA bit of the Current 
Processor Status Register during the following steps): 

1 . Set the TCV field with the desired interval count for the first timing Interval. Note 
that this Interval must be sufficiently large to allow the execution of the next step 
before the TCV field decrements to zero (this is normally the case). 

2. Set the TRV field with the desired interval count for the second timing interval. The 
OV and IN bits are reset, and the IE bit is set as desired. Note that the second 
timing interval may be equivalent to the first timing interval. 

7.3.6.3 HANDLING TIMER INTERRUPTS 

The following is a suggested list of actions to be taken to handle a Timer interrupt: 

1 . Read the Timer Reload register into a general-purpose register. 

2. Reset the IN bit in the general-purpose register. 
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3. Set the TRV field in tlie general-purpose register to the desired value for the next 
timing Interval. Note that, at this time, the Timer Counter is timing the current 
Interval. Also, this step may be omitted, If ail Intervals are equivalent. 

4. Write the contents of the general-purpose register back into the Timer Reload 
register. 

5. Test the general-purpose-register copy of the OV bit, and if it is set, report the 
error as appropriate. 

6. Perform any system operations required for the Timer interrupt. 

7. Execute an interrupt return. 

7.3.6.4 TIMER FACILITY USES 

Since the Timer Facility has a resolution of a single processor cycle, It may be used to 
perform precise timing of system events. For example. It may be used to determine an 
exact measurement of the number of cycles between two events In the system, or to 
perform precise time-critical control functions. Note that the Timer interrupt Is enabled 
and disabled separately from other processor Interrupts, so that Its priority can be 
separately specified. 

The Timer Facility can be used to generate time intervals for collecting virtual page 
usage Information (see Section 7.3.3). For example, If memory management relies on 
a working-set page-replacement algorithm, the Timer Facility can establish the 
working-set window. 

The Timer Facility can be shared among multiple processes. This sharing Is accom- 
plished by the implementation of a queue for timer events, which are sorted in order 
of Increasing event time. On each occurrence of a Timer Interrupt, the TRV field is set 
for the Interval between the next two events In the queue, while the Timer Counter 
Register is counting the current interval (because of a previous setting of the TRV 
field). The event at the beginning of the queue Identifies other system actions to be 
taken for the Timer interrupt. This event is removed from the queue after the appropri- 
ate actions are taken. 

7.4 PIPELINE FEATURES EXPOSED TO SOFTWARE 

In certain cases, the Am29050 microprocessor pipeline is exposed during Instruction 
execution, in that the execution of certain instructions are dependent on the execution 
of previous Instructions. This section discusses the cases where the pipeline Is ex- 
posed to software, and the resulting effect on instruction execution. 

7.4.1 Delayed Branch 

The effect of jump and call Instructions Is delayed by one cycle to allow the processor 
pipeline to achieve maximum throughput. When one of these branches is successful, 
the Instruction Immediately following the jump or call is executed before the target 
instruction of the jump or call is executed. Jump and call instructions collectively are 
referred to as delayed branches, and the Immediately following instruction is called 
the delay Instruction. 
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For example, in the following code fragment: 



cpeq gr96, Ir6, Ir7 (1) 

jmpf gr96, label (2) 

sub Ir6, Ir6, 1 (3) 

const Ir6, (4) 



label: call IrO, sort (5) 

add Ir2, Ir5, (6) 

cpneq Ir3, gr96, (7) 



The SUB instruction (3) is executed regardless of the outcome of the JMPF instruction 
(2). Of course, if the JMPF is not successful, the CONST instruction (4) is also exe- 
cuted. If the JMPF is successful, then the instruction sequence is: (3), (5), (6), and 
then the first Instruction of the SORT procedure. Note that the CALL instruction (5) is 
also a delayed branch, so the Instruction immediately following it, (6), is always exe- 
cuted. After the SORT procedure executes the return sequence, the CPNEQ instruc- 
tion (7) is the next Instruction executed. 

The benefit of delayed branches is improved performance and a simplified processor 
implementation. Performance is improved because the processor pipeline executes 
useful instructions in a larger number of cycles, compared to an implementation with- 
out delayed branches. 

For example, ignoring all other effects on performance, and assuming that 15% of all 
instructions are branches, then a processor without delayed branches would take at 
least two cycles for 15% of its instructions, leading to 0.85(1) + 0.15(2) = 1.15 cycles 
per instruction, on average. This represents a 15% performance degradation com- 
pared to a processor with delayed branches (assuming, for this simple example, that 
the delay instruction is always useful). 

The cost of having delayed branches is either the extra effort required when the com- 
piler takes advantage of delayed branches (by re-organizing code), or the extra 
NO-OP instruction which the compiler inserts after every branch to guarantee correct 
program operation. Since the compiler expends only a small amount of effort to avoid 
wasting time and space with NO-OPs, and since the performance improvement result- 
ing from this effort is significant, delayed branches are beneficial overall. 

When two Immediately adjacent branches are taken, the target of the first branch 
preempts execution of the delay cycle of the second branch, and the target of the 
second branch then follows the target of the first branch. For example, in the following 
code fragment: 



jmpll (1) 

jmpl2 (2) 

add Ir4, Ir4, Ir5 (3) 
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L1: sub gr96,gr96, 1 (4) 

subc gr97, gr97, (5) 



L2: const gr100, OxffOf (6) 

subr gr101,gr101.1 (7) 

or gr100, gr100, gr101 (8) 



An unconditional JMP instruction (1) is followed immediately by another unconditional 
JMP instruction (2). (In this example, unconditional JMPs are used; however, any two 
immediately adjacent taken branches exhibit the same behavior.) The sequence of 
executed instructions in this case is: JMP instruction (1), JMP instruction (2), SUB 
Instruction (4), CONST Instruction (6), SUBR Instruction (7), OR instruction (8), and so 
on. Note that the ADD instruction (3) Is not executed. Also, the target of the first JMP 
Instruction (1) was merely visited; control did not continue sequentially from L1 but 
rather continued from L2. 



7.4.2 Overlapped Operations 



The Am29050 microprocessor overlaps external data references with other opera- 
tions, and typically performs floating-point operations in parallel with integer opera- 
tions and with other floating-point operations. Certain programming practices are 
necessary to exploit this parallelism to improve program performance. 



7.4.2.1 EXTERNAL ACCESS 



In order to make full use of overlapped storage accesses, some Instruction reorgani- 
zation may be necessary. For example. In the following sequence: 



loop: 



sll gr121,gr119,2 (1) 

add gr121,gr120,gr121 (2) 

load 0, 0,gr121,gr121 (3) 

add gr96, gr96,gr121 (4) 

sub gr96, gr96, 3 (5) 

add gr119,gr119, 1 (6) 

cpit gr122,gr119, Ir2 (7) 

jmpt gr122, loop (8) 

nop (9) 



the ADD Instruction (4) uses the result of the LOAD instruction (3). However, the 
following four instructions do not depend on the result of the LOAD. Therefore, the 
ADD Instruction (4) can be moved past the JMPT (8)— since it always will be executed 
even If the JMPT is taken— and replace the NO-OP instruction (9).The resulting 
sequence is: 
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loop: 



sll gr121,gr119,2 (1) 

add gr121,gr120.gr121 (2) 

load 0,0.gr121.gr121 (3) 

sub gr96, gr96, 3 (4) 

add gr119,gr119. 1 (5) 

cpit gr122,gr119, Ir2 (6) 

jmpt gr122, loop (7) 

add gr96, gr96. gr121 (8) 



The instructions (4) through (7) are likely to be executed while external memory satis- 
fies the load request, resulting in improved throughput. The processor thus allows 
parallelism to be exploited by instruction reordering. 

The overlapped load feature may be used to Improve processor performance, but 
imposes no constraints on instruction sequences, as delayed branches do. The proc- 
essor implements the proper pipeline interlocks to make this parallelism transparent 
to a running program. 

7.4.2.2 FLOATIKG-POINT UNIT OPERATION 

Programs that use floating-point instructions can also benefit from instruction schedul- 
ing. Each of the individual floating-point pipelines (Adder, Multiplier, Divider/Square- 
Root Unit) can operate in parallel with integer Instructions and external accesses, and 
with each other. Parallel execution is possible as long as subsequent instructions do 
not need the results of parallel floating-point operations. For example, consider the 
following code sequence: 

; a = b + c*d-e/f 
; g = *p + i«2; 



INST OPERANDS START ON CYCLE 

fmul t1,c,d 1 

fadd t1,b,t1 4 

fdiv t2, e, f 5 

fsub a,t1,t2 16 

load 0,0,t1,p 17 

sll t2,i, 2 18 

add g,tl,t2 19 



The two program statements are Independent, so they can be rearranged to take 
better advantage of the parallelism in the Floating-Point Unit: 



INST OPERANDS START ON CYCLE 



fdiv 


ti.e.f 


1 


fmul 


t2, 0, d 


2 


load 


0, 0, t3, p 


3 


sll 


t4, i, 2 


4 


fadd 


t2,b,t2 


5 


add 


g.t3,t4 


6 


fsub 


a, t2, t1 


11 
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Note that the scheduled version of the code fragment uses more temporary registers 
(tA7) to hold the results of parallel computations. The large register file of the Am29050 
microprocessor facilitates this kind of code scheduling. 

7.4.3 Delayed Effects of Registers 

The modification of some registers has a delayed effect on processor behavior, be- 
cause of the processor pipeline. The affected registers are the Stack Pointer (Global 
Register 1), Indirect Pointers A, B, and C, the MMU Configuration Register, and the 
Current Processor Status Register. 

An instruction that writes to the Stack Pointer can be followed Immediately by an 
instruction that reads the Stack Pointer. However, any instruction that references a 
local register also uses the value of the Stack Pointer to calculate an absolute-register 
number. At least one cycle of delay must separate an instruction that updates the 
Stack Pointer and an instruction that references a local register. In most systems, this 
affects procedure call and return only (see Section 7.1 .2). In general, though, an 
Instruction that immediately follows a change to the Stack Pointer should not refer- 
ence a local register (however, note that this restriction does not apply to a reference 
of a local register via an indirect pointer). 

The indirect pointers have an implementation similar to the Stack Pointer, and exhibit 
similar behavior. At least one cycle of delay must separate an Instruction that modifies 
an indirect pointer and an instruction that uses that indirect pointer to access a 
register. 

Note that it normally is not possible to guarantee that the delayed effect of the Stack 
Pointer and indirect pointers is visible to a program. If an interrupt or trap is taken 
Immediately after one of these registers is set, then the interrupted routine sees the 
effect of the setting in the following instruction, because many cycles elapse between 
the two instructions. For this reason, a program should not be written in a manner that 
relies on the delayed effect; the results of this practice may be unpredictable. 

At least one cycle of delay must separate a Move To Special Register that modifies 
the Page Size (PS) field of the MMU Configuration Register and an instruction that 
performs address translation. The latter instruction includes successful branches, 
loads, and stores. 

If the Freeze (FZ) bit of the Current Processor Status Register is reset from 1 to 0, two 
cycles are required before all program state is reflected properly in the registers 
affected by the FZ bit. This implies that interrupts and traps cannot be enabled until 
two cycles after the FZ bit is reset, for proper sequencing of program state. 
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CHAPTER 8 



INSTRUCTION SET 



£1 



This chapter provides a specification of the Am29050 microprocessor instruction set. 
Sections 8.1 through 8.3 describe the terminology used, the setting of the ALU Status 
Register by instructions, and the instruction formats. Section 8.4 describes each 
instruction in detail; instructions are presented alphabetically by assembler mne- 
monic. Finally, Section 8.5 gives an index of Instructions by operation code. 



8.1 



INSTRUCTION-DESCRIPTION NOMENCLATURE 

To simplify the specification of the instruction set, special terminology is used through- 
out this chapter. This section defines the terminology and symbols used to describe 
instruction operands, operations, and the assembly-language syntax. 

This section does not describe all terminology used. It excludes certain descriptive 
terms that have an obvious meaning. 



8.1 .1 Operand Notation and Symbols 

Throughout this chapter, instruction operands are signed, two's-complement, word 
integers, unless otherwise noted. The term register is used consistently to denote a 
general-purpose register; other types of registers are described explicitly. 

The following notation is used in the description of instruction operands: 

011 6 1 6-bit immediate data, zero-extended to 32 bits. 



1116 
BP 



COUNT 



DEST 

EXTERNAL 
WORD[/?] 

FALSE 

FC 

h'rf 



1 6-bit immediate data, one-extended to 32 bits. 

The Byte Pointer (BP) field of the ALU Status Register. The BP 
field selects a byte or half-word within a word, and is interpreted 
according to the Byte Order bit of the configuration Register. 

The Carry (C) bit of the ALU Status Register. The C bit is logi- 
cally zero-extended to 32 bits when it is involved in a word 
operation. 

The value of the Count Remaining field of the Channel Control 
Register. Note that COUNT does not refer to this field directly, 
but rather to the value of the field at the beginning of a LOADM 
or STOREM instruction. 

The general-purpose register that is the destination of an instruc- 
tion (i.e., the register used to store the result). 

The word in an external device or memory with address n. This 
terminology also is used for coprocessor words, except that the 
address n either has no pre-defined Interpretation or is a data 
item transferred to the coprocessor. 

The Boolean constant FALSE. 

The Funnel Shift Count (FC) field of the ALU Status Register. 

The hexadecimal constant n. 
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116 
IPA 
IPB 
IPC 
PC 



Q 



Register RA 
Register RB 

Qctnictor RP. 

SPDEST 



SPECIAL 

Special-purpose 
Register SA 

SRCA 
SRCB 

SRCA.BYTEn 
SRCB.BYTEn 

TARGET 



TLB[/7] 

TRUE 
TWIN 



1 6-bit immediate data. 

Indirect Pointer A Register. 

Indirect Pointer B Register. 

Indirect Pointer C Register. 

The Program Counter Register. This register is not explicitly ac- 
cessible by instruction, but does appear as an operand for cer- 
tain instructions. The Program Counter always contains the word 
address of the instruction being executed, and is 30 bits In 
length. 

The Q Register. 

These designate the general-purpose registers specified by the 
instruction fields RA, RB, and RC (see Section 8.3). 

The special-purpose register that is the destination of an 
instruction. 

The content of a special-purpose register, used as an instruction 
operand. 

Designates the special-purpose register specified by the instruc- 
tion field SA (see Section 8.3). 

The contents of general-purpose registers, used as instruction 
operands. 

Designate the byte numbered n within the SRCA or SRCB 
operand. 

The target-instruction address specified by a jump or call instruc- 
tion. This address is either absolute, or Program-Counter 
relative. 

The Translation Look-Aside Buffer Register with register num- 
ber n. 

The Boolean constant TRUE. 

General-purpose registers are paired by absolute-register num- 
ber, such that even-numbered registers are paired with odd-num- 
bered registers having the next-highest register number. The twin 
of a given register is the other register In the pair to which the 
given register belongs. For example. Local Register 5 is the twin 
of Local Register 4, and vice versa. 



8.1.2 Operator Symbols 

The following symbols are used to describe instruction operations: 

A« B Left shift of the A operand by the shift amount given by the B 



A»B 



A//B 



operand. 

Right shift of the A operand by the shift amount given by the B 
operand. 

Concatenation. The B operand is appended to the A operand. In 
the resulting quantity, the A operand makes up the high-order 
part, and the B operand makes up the low-order part. 
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A&B 


Bitwise AND. 


A|B 


Bitwise OR. 


A'^B 


Bitwise exclusive-OR. 


-A 


One's-complement. 


A<-exp 


Assignment of the A location by the result of the expression on 
the right side. 


A = B 


Equal to. 


AoB 


Not equal to. 


A>B 


Greater than. 


A>B 


Greater than or equal to. 


A<B 


Less than. 


A<B 


Less than or equal to. 


A+B 


Addition. 


A-B 


Subtraction. 


A*B 


Multiplication. 


A/B 


Division. 


A..B 


A subrange which includes the A operand and the B operand. 



A ORB 



This symbol Is used for subranges of bits as well as subranges of 
words. 

Logical OR of two Boolean conditions. 



8. 1 .3 Control-Flow Terminology 

The following terminology Is used to describe the control functions performed during 
the execution of various Instructions: 



Continue 

IF condition 
THEN operations 
ELSE operations 



Signed overflow 



Trap(A7) 



Unsigned 
overflow 

Unsigned 
underflow 



VN 



Continue execution of the current instruction sequence. 

The condition following the IF is tested. If the condition holds, the 
operations following the THEN are performed. If the condition 
does not hold, the operations following the ELSE are performed. 
If the ELSE is not present and the condition does not hold, no 
operation is performed. 

This condition Is present when the result of an add or subtract of 
two's-complement operands cannot be represented by a signed 
word integer. 

Specifies a trap with vector number n. The vector number n may 
be specified Indirectly (e.g., Trap (VN)) or explicitly by symbolic 
name (e.g., Trap (Out of Range)). 

This condition is present when the result of an add of unsigned 
operands cannot be represented by an unsigned word integer. 

This condition is present when the result of a subtract of un- 
signed operands cannot be represented by an unsigned integer 
(i.e., when the result is less than zero). 

Designates the trap vector number specified by the instruction 
field VN (see Section 8.3). 
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8. 1 .4 Assembler Syntax 

This chapter does not contain a full description of the Instruction assembler, but pro- 
vides a rudimentary description of the assembler syntax. The following notation is 
used to describe assembler tokens: 

ce Determines the Coprocessor Enable (CE) bit of a load or store 

Instruction. 

cnti Determines the 7-bit control field in a load or store instruction. 

constS Specifies a constant that can be expressed by 8 bits. 

consti 6 Specifies a constant that can be expressed by 1 6 bits. 

ra These tokens name general-purpose registers. In a formal 

rb sense, these represent the same token, since the name of a 

re register does not depend on its instruction use. However, three 

distinct tokens are used to clarify the relationship between the 
assembler syntax, instruction operands, and instruction fields. 

spid A symbolic identifier for a special-purpose register. 

target A symbolic label for the target of a jump or call instruction. 

vn Specifies a trap vector number. 

8.2 ARITHMETIC/LOGIC STATUS RESULTS OF INSTRUCTIONS 

8.2.1 Arithmetic/Logic Status Bits 

The arithmetic/logic status bits of the ALU Status Register are: 

V Overflow 

N Negative 

Z Zero 

C Carry 

The C bit is used In extended arithmetic operations (i.e., on operands greater than 32 
bits in length), and the N bit is used In divide step operations. Other than these uses, 
the status bits are not Involved in instruction operations. In particular, they are not 
used to determine the outcome of conditional jump instructions; Boolean values in 
registers are used instead for this purpose. The status bits are primarily Informational. 

Except for instructions that explicitly modify the ALU Status Register, the status bits 
are modified only by the execution of instructions in the Arithmetic and Logical 
classes. The Arithmetic and Logical instructions affect the status bits differently. The 
following two sections describe the setting of the status bits by Arithmetic and Logical 
instructions. 

When the Freeze (FZ) bit of the Current Processor Status Register is 1 , the ALU 
Status Register Is not modified except by the Move To Special Register instruction. 

8.2.2 Arithmetic Operation Status Results 

The Arithmetic instructions modify the V, N, Z, and C bits. These bits are set accord- 
ing to the result of the operation performed by the instruction. 

All instructions in the Arithmetic class— except for MULTIPLY, MULTM, DIVIDE, 
MULTIPLU, MULTMU, and D I VI DU— perform an add. In the case of subtraction, the 
subtract is performed by adding the two's-complement or one's-complement of an 
operand to the other operand. The multiply step and divide step operations also 
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perform adds, again possibly complementing one of the operands before the opera- 
tion is performed. In general, the status bits are based on the results of the add. 

If two's-complement overflow occurs during the add, the V bit of the ALU Status Reg- 
ister is set; othen^^ise it is reset. Two's-complement overflow occurs when the carry-in 
to the most-significant bit of the intermediate result differs from the carry-out. When 
this occurs, the result cannot be represented by a signed word integer. Note that the 
V bit always is set in this manner, even when the result is unsigned. 

The N bit of the ALU Status Register is set to the value of the most-significant bit of 
the result of the add. Note that the divide step and multiply step operations may shift 
the result after the operation is performed. In the cases where shifting occurs, the N 
bit may not agree with the result that is written Into a general-purpose register, since 
the N bit is based only on the result of the add, not on the shift. 

If the result of the add causes a zero word to be written to a general-purpose register, 
the Z bit of the ALU Status Register Is set; othenA/lse, it is reset. The Z bit always 
reflects the result written into a general-purpose register; if shifting is performed by a 
multiply or divide step, the Z bit reflects the shifted value. 

If there is a carry out of the add operation, the C bit is set; otherwise it is reset. 

8.2.2.1 CORRECTING GUT-GF-RANGE RESULTS 

Some Arithmetic instructions cause an Out of Range trap if the arithmetic operation 
causes an overflow or underflow. When an Out of Range trap occurs, the result of the 
operation — though incorrect — is written into the destination register. Furthermore, the 
Program Counter 2 Register contains the address of the trapping instruction, and the 
ALU Status Register contains an indication of the cause of the trap. It is possible, If 
required, for the trap handler to use this information to form the correct result. 

The ALU Status indicates the cause of the Out of Range trap, based on the operation 
performed, as follows: 

1 . Signed overflow. If the Out of Range trap is caused by signed, two's-complement 
overflow (this can occur for both signed adds and subtracts), the V bit is 1 . 

2. Unsigned overflow. If the Out of Range trap is caused by unsigned overflow (this 
can occur only for unsigned adds), the C bit is 1 . 

3. Unsigned underflow. If the Out of Range trap is caused by unsigned underflow 
(this can occur only for unsigned subtracts), the C bit is 0. 

The multiply instructions MULTIPLY and MULTIPLU can cause an Out of Range trap 
if the MO bit of the Integer Environment Register is and the operation overflows. 
However, these instructions do not set the ALU Status Register. This exception is 
detected using the Exception Opcode Register. 

8.2.3 Logical Operation Status Results 

The Logical instructions modify the N and Z bits. These bits are set according the 
result of the instruction. The V and bits are meaningless in regard to the logical 
Instructions, so they are not modified. 

The N bit of the ALU Status Register is set to the value of the most-significant bit of 
the result of the logical operation. 

If the result of the logical operation is a zero word, the Z bit of the ALU Status Register 
is set; othen/vise, it is reset. 
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8.2.4 Floating-Point Status 

The floating-point instructions clieck for a number of exceptional conditions, and 
report these exceptions by setting bits of the Floating-Point Status Register (see 
Section 3.2.3). The exceptional conditions also may cause traps, depending on the 
state of mask bits in the Floating-Point Environment Register. There are two groups of 
status bits in the Floating-Point Status Register: trap status bits and sticky status bits. 
When an exception is detected, the Am29050 microprocessor sets the trap status bit 
and/or the sticky status bit associated with the exception, depending on the corre- 
sponding exception mask bit and on whether or not a trap occurs. The sticky status bit 
is set whenever the corresponding exception is masked, regardless of whether or not 
a trap occurs. A trap status bit is set whenever a trap occurs, regardless of the state 
of the corresponding mask bit. 

A trap status bit is reset when a trap occurs and the indicated status does not apply to 
the trapping operation. A sticky status bit is reset only by software. 

Since a floating-point exception may affect either a trap status bit, a sticky status bit, 
or both, the description of status results for floating-point instructions in this section 
indicates the exceptions that may be detected, rather than which status bits are set. 
The following terminology is used: 

fpD Divide By Zero. The processor determines whether a divide operation has a 
zero divisor and a non-zero, finite dividend. If so, the DT and/or DS bits of the 
Floating-Point Status Register are set. 

fpX Inexact Result. If the result of the associated floating-point operation is not 
equal to the infinitely-precise result, the XT and/or XS bits of the Floating-Point 
Status Register are set. 

fpU Underflow. If the result of the associated floating-point operation is too small to 
be expressed in the destination format, the UT and/or US bits of the Floating- 
Point Status Register are set. 

fpV Overflow. If the result of the associated floating-point operation is too large to be 
expressed In the destination format, the VT and/or VS bits of the Floating-Point 
Status Register are set. 

fpR Reserved Operand. If one or more input operands to the associated 

floating-point operation is a reserved value, or if the result of this floating-point 
operation is a reserved value, the RT and/or RS bits of the Floating-Point Status 
Register are set. 

fpN Invalid Operation. If the input operands to the associated floating-point 
operation produce an indeterminate result, the NT and/or NS bits of the 
Floating-Point Status Register are set. 

8.3 il^STeyCTiOM FORMATS 

All instructions for the Am29050 microprocessor are 32 bits in length, and are divided 
into four fields, as shown in Figure 8-1. These fields have several alternative defini- 
tions, as discussed below. In certain instructions, one or more fields are not used, 
and are reserved for future use. Even though they have no effect on processor opera- 
tion, bits in reserved fields should be 0, to insure compatibility with future processor 
versions. 
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Figure 8-1 


Instruction Format 
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A 
M 


RC 

117... no 
n5...i8 

VN 
CE//CNTL 




RA 
SA 




RB 

RBorl 

19 ...12 

17 ...10 

UI//RND//FD//FS 














Reserved //FS 





The instruction fields are defined as follows: 

Bits 31-24 

OP 



M 

Bits 23-1 6 

RC 
I17...I10 



115... 18 

VN 
CE//CNTL 

Bits 15-8 
RA 
SA 
Bits 7-0 

RB 
RBorl 



This field contains an operation code, defining the operation to 
be performed. In some instructions, the least-significant bit of the 
operation code selects between two possible operands. For this 
reason, the least-significant bit is sometimes labeled A or M with 
the following interpretations: 

(Absolute): The A bit is used to differentiate between Program- 
Counter relative {A=0) and absolute {A = 1) Instruction ad- 
dresses, when these addresses appear within instructions. 

(Immediate): The M bit selects between a register operand 
(M = 0) and an immediate operand (M = 1), when the alternative 
is allowed by an instruction. 



The RC field contains a global or local register number. 

This field contains the most-significant eight bits of a 16-bit in- 
struction address. This is a word address, and may be program- 
counter relative or absolute, depending on the A bit of the opera- 
tion code. 

This field contains the most-significant eight bits of a 16-bit in- 
struction constant. 

This field contains an 8-bit trap vector number. 

This field controls a load or store access, as described in 
Sections 3.4.4 and 6.1 .2. 



The RA field contains a global or local register number. 
The SA field contains a special-purpose register number. 

The RB field contains a global or local register number. 

This field contains either a global or local register number, or an 
8-bit instruction constant, depending on the value of the M bit of 
the operation code. 
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19 ...12 



17... 10 



This field contains the least-significant eight bits of a 16-bit in- 
struction address. This is a word address, and may be program- 
counter relative or absolute, depending on the A bit of the opera- 
tion code. 

This field contains the least-significant eight bits of a 16-bit in- 
struction constant. 



Ul // RND // FD // FS This field controls the operation of the CONVERT instruction. 

reserved //FS This field is the FS portion of the above field and specifies the 

operand format for the CLASS and SORT instructions. 

The fields described above may appear in many combinations. However, certain 
combinations that appear frequently are shown in Figure 8-2. 



Figure 8-2 Frequently Occurring instruction Field Uses 

Three operands, with possible 8-bit constant: 

31 23 15 



I I I I I I I 
XXXXXXXM 



I I I I I I I 
RC 



I I I I I i I 

RA 



I I I II i I 

RB or I 



Three operands, without constant: 



31 


23 




15 




7 







1 1 1 1 1 1 1 

xxxxxxxo 


1 1 i 1 1 1 1 

RC 


i 1 1 1 i 1 1 
RA 


1 1 1 1 1 i ( 

RB 



One register operand, with 16-blt constant: 



31 


23 




15 




7 







i 1 t 1 1 1 i 
XX X X X X X 1 


1 M i 1 1 i 

h5..l8 


II i [ i 1 1 
RA 


II i 1 1 1 i 

17.. 10 



Jumps and calls with 16-bit instruction address: 

31 23 15 





1 1 1 1 1 1 1 

XXXXXXXA 


1 1 1 1 1 1 1 

117... 110 


1 1 1 1 1 1 i 

RA 


1 1 1 1 1 1 1 

19.. 12 






Two operands with trap vector number: 

31 23 


15 




7 











1 1 1 1 1 1 1 
XXXXXXXM 


1 1 1 1 1 1 1 

VN 


1 1 1 i 1 i 1 

RA 


1 1 1 i 1 1 1 

RBori 






Loads and stores: 

31 23 


15 




7 











1 1 i i i 1 1 
XXXXXXXM 




1 1 1 1 1 1 

CNTL 


1 1 1 i 1 1 1 
RA 


1 1 1 1 1 1 i 

RBorl 






CE 
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8.4 



INSTRUCTION DESCRIPTION 



This section describes each Am29050 microprocessor instruction in detail. Figure 8-3 
illustrates the layout of the information given for each description. 



Figure 8-3 



Instruction-Description Format 



Instruction 
Mnemonic 

Instruction 
Name 



Brief Operation 
Description 



Assembler 
Syntax 



Arithmetic/Logic 
Status Result 



Operand Specification- 
Describes the 
instruction fields' 
relations to operands, 
and implicit operands 
in some cases 



Instruction Format — 
specifies field 
options used 

Operation Code — 
HEX format 



Detailed Description 
of instruction 
operation 



/ 



< 



\ 



ADD 



ADD 



► Add 



Operation: DEST ^ SRCA = SRCB 

Assembler 

Syntax: ADD re, ra, rb 

or 
ADD re, ra, constS 



Status: 



V, N, Z, C 



Operands: SRCA 
SRCB 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 



DEST Register RC 



31 


23 


15 


7 


1 1 1 1 1 1 1 

0001 01 OM 


1 1 1 1 1 1 1 
RC 


1 1 1 II 1 1 

RA 


M 1 1 1 1 1 

RBorl 



OP=: 14,15 



ADD 



Description: The SRCA operand is added to the SRCB 
operand, and the result is placed Into the 
DEST location. 
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ADD 



Add 



ADD 



Operation: 


DEST <- SRCA = SRCB 










Assembler 
Syntax: 


ADD re, 

or 
ADD re. 


ra, rb 
ra, eonstS 






Status: 


V,N.Z,C 






Operands: 


SRCA 
SRCB 

DEST 


Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 






31 


23 


15 7 







1 1 1 1 1 

10 


1 1 

1 M 


i 1 1 1 i 1 1 
RC 


1 1 i 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 


OP=14. 


15 




A 


DD 









Description: Tlie SRCA operand is added to the SRCB operand, and tfie result Is 
plaeed into the DEST loeatlon. 
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ADDC 



Add with Carry 



Operation: DEST <- SftCA + SRCB + C 

Assembler 

Syntax: ADDC re, ra, rb 
or 
ADDC re, ra, eonstS 

Status: V, N, Z, C 

Operands: SRCA 

SRCB 



31 



DEST 
23 



Content of register RA 

M =0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



15 



I I I I I I I 

1 1 1 M 



I I I I I I I 

RC 



I I I I I I I 
RA 



ADDC 



I I I 



I I I I 

RBorl 



0P = 1C. ID 



ADDC 



Description: The SRCA operand is added to the SRCB operand and the value of 
the ALU Status Carry bit, and the result Is placed Into the DEST 
location. 
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ADDCS 



Add with Carry, Signed 



ADDCS 



Operation: DEST <- SRCA + SRCB + C, 

IF signed overflow THEN Trap (Out of Range) 



Assembler 
Syntax: 


ADDCS re, 

or 
ADDCS re, 


ra, rb 
ra, eonstS 










Status: 


V, N. Z, C 








Operands: 


SRCA 
SRCB 

DEST 


Content of register RA 

M=0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 






31 


23 


15 7 







1 1 1 1 1 

1 10 


1 1 

M 


1 1 ! 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 
RBorl 


OP«18, 19 






AC 


)DCS 









Description: The SRCA operand is added to the SRCB operand and the value of 
the ALU Status Carry bit, and the result is plaeed into the DEST 
loeation. If the add operation eauses a two's-eomplement signed 
overflow, an Out of Range trap oeeurs. 

Note that the DEST loeation is altered whether or not an overflow 
oeeurs. 
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ADDCU 



Add with Carry, Unsigned 



Operation: DEST <- SRCA + SRCB + C, 

IF unsigned overflow THEN Trap (Out of Range) 

Assembler 

Syntax: ADDCU re, ra, rb 
or 
ADDCU re, ra, constS 

V. N, Z, C 



Status: 
Operands: 



SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



ADDCU 



n 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 M 


1 1 1 1 i 1 1 
RC 


1 1 1 1 1 1 i 
RA 


1 i 1 i 1 1 1 

RBorl 



OP=1A. 1B 



ADDCU 



Description: The SRCA operand is added to the SRCB operand and the value of 
the ALU Status Carry bit, and the result is plaeed into the DEST 
loeatlon. If the add operation eauses an unsigned overflow, an Out of 
Range trap oeeurs. 

Note that the DEST loeatlon is altered whether or not an overflow 
oeeurs. 
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ADDS 



ADDS 



Add, Signed 



Operation: DEST«-SRCA+SRCB 

IF signed overflow THEN Trap (Out of Range) 

Assembler 

Syntax: ADDS re, ra, rb 
or 
ADDS re, ra, constS 

Status: V, N, Z, C 

Operands: SRCA Content of register RA 

SRCB M=0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



M 


23 




15 




7 







1 1 1 1 1 1 1 

1 M 


1 1 II II 1 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 10, 11 



ADDS 



Description: The SRCA operand is added to the SRCB operand, and the result is 
placed into the DEST location. If the add operation causes a 
two's-complement signed overflow, an Out of Range trap occurs. 

Note that the DEST location is altered whether or not an overflow 
occurs. 
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ADDU 



ADDU 



Add, Unsigned 



Operation: DEST <- SRCA + SRCB 

IF unsigned overflow THEN Trap (Out of Range) 



Assembler 
Syntax: 


ADDU re, ra 

or 
ADDU re, ra 


rb 
eonstS 








Status: 


V, N. Z, C 






Operands: 


SRCA 


Content of register RA 






SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEST 


Register RC 




n 


23 


15 7 





1 1 1 1 1 1 

10 


1 
1 M 


1 II II II 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB or 1 



OP = 12, 13 



ADDU 



Description: The SRCA operand is added to the SRCB operand, and the result is 
plaeed into the DEST loeation. If the add operation causes an 
unsigned overflow, an Out of Range trap oeeurs. 

Note that the DEST loeation is altered whether or not an overflow 
oeeurs. 
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AND 



AND Logical 



Operation: DEST«- SRCA & SRCB 

Assembler 

Syntax: AND re, ra, rb 
or 
AND re, ra, eonstS 

Status: N, Z 

Operands: SRCA 

SRCB 



DEST 



Content of register RA 

M = 0: Content of register RB 
l\y| = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



AND 



31 


23 




15 




7 







II 1 II II 

1 1 M 


1 1 1 1 1 1 1 

RC 


1 II 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB or 1 



OP = 90, 91 



AND 



Description: Tiie SRCA operand is logieally ANDed, bit-by-bit, with the SRCB 
operand, and the result is placed into the DEST loeation. 
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ANDN 



AND-NOT Logical 

DEST^SRCA&-SRCB 



Operation 

Assembler 

Syntax: ANDN re, ra, rb 
or 
ANDN re, ra, eonstS 

Status: N, Z 

Operands: SRCA 

SRCB 



Content of register RA 



DEST 



M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



ANDN 



31 


23 




15 




7 







1 i 1 1 1 1 1 

1 1 1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 i 

RA 


1 i 1 1 1 1 1 

RBori 



OP = 9C, 9D 



ANDN 



Description: The SRCA operand is logically ANDed, bit-by-bit, with the 

one's-complement of the SRCB operand, and the result is plaeed into 
the DEST location. 
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ASEQ 



Assert Equal To 



Operation: IF SRCA = SRCB THEN Continue 
ELSE Trap (VN) 



Assembler 




Syntax: 


ASEQvn, ra, rb 




or 




ASEQ vn, ra, constS 



Status: Not affected 
Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 



VN 



M = 1 : 1 (Zero-extended to 32 bits) 
Trap vector number 



ASEQ 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 M 


1 1 1 1 1 1 1 

VN 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP = 70, 71 



ASEQ 



Description: If the SRCA operand is equal to the SRCB operand, instruction 
execution continues; otherwise, a trap with the specified vector 
nunnber occurs. 

For programs in the User mode, a Protection Violation trap 
occurs— instead of the assert trap—if a vector number between and 
63 is specified. 
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ASGE 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



ASGE 



Assert Greater Than or Equal To 

IF SRCA > SRCB THEN Continue 
ELSE Trap (VN) 

ASGE vn, ra, rb 

or 
ASGE vn, ra, constS 

Not affected 

SRCA 

SRCB 



VN 



Content of register RA 

M =0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Trap vector number 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 1 M 


1 1 1 1 1 1 1 

VN 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 

RBorl 



OP = 5C, 5D 



ASGE 



Description: If the value of the SRCA operand is greater than or equal to the value 
of the SRCB operand, instruction execution continues; otherwise, a 
trap with the specified vector number occurs. 

For programs in the User mode, a Protection Violation trap 
occurs— Instead of the assert trap— if a vector number between and 
63 Is specified. 
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ASGEU 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



ASGEU 



Assert Greater Than or Equal To, Unsigned 

IF SRCA > SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 



ASGEU vn, ra, rb 

or 
ASGEU vn. ra. constS 

Not affected 

SRCA 

SRCB 



VN 



Content of register RA 

M = 0: Content of register RB 
Ivi = i : I (Zero-exiended to 32 bits) 

Trap vector number 



31 


23 




15 




7 







1 1 1 1 1 1 i 

1 1 11 1 M 


1 1 1 1 1 1 1 

VN 


1 II 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBori 



OP = 5E,5F 



ASGEU 



Description: If the value of the SRCA operand is greater than or equal to the value 
of the SRCB operand, Instruction execution continues; othenfl^lse, a 
trap with the specified vector number occurs. For the comparison, 
both operands are treated as unsigned integers. 

For programs in the User mode, a Protection Violation trap 
occurs — instead of the assert trap — if a vector number between 
and 63 is specified. 
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ASGT 



ASGT 



Assert Greater Than 



Operation: IF SRCA > SRCB THEN Continue 
ELSE Trap (VN) 

Assembler 

Syntax: ASGT vn, ra, rb 
or 
ASGT vn, ra, constS 

Status: Not affected 

Operands: SRCA 

SRCB 



VN 



Content of register RA 

M = 0: Content of register RB 
l\/l =1 : 1 (Zero-extended to 32 bits) 

Trap vector number 



31 




23 




15 




7 







1 

1 


II 1 1 1 1 

1 1 M 


1 1 1 1 1 1 1 
VN 


1 II 1 1 i 1 
RA 


1 1 1 1 1 1 1 
RBorl 



OP = 58, 59 



ASGT 



Description: If the value of the SRCA operand Is greater than the value of the 
SRCB operand, Instruction execution continues; othen^/ise, a trap 
with the specified vector number occurs. 

For programs in the User mode, a Protection Violation trap 
occurs— instead of the assert trap— if a vector number between and 
63 is specified. 
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ASGTU 



ASGTU 



Assert Greater Than, Unsigned 



Operation: IF SRCA > SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 

Assembler 

Syntax: ASGTU vn, ra, rb 
or 
ASGTU vn, ra, constS 

Status: Not affected 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1:1 (Zero-extended to 32 bits) 

VN Trap vector number 



31 


23 




15 




7 







1 1 1 1 i 1 1 

1 1 1 1 M 


1 i 1 i i 1 1 

VN 


1 1 i 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 5A. 5B 



ASGTU 



Description: If tiie value of the SRCA operand is greater than the value of the 
SRCB operand, instruction execution continues; othenrt/ise, a trap 
with the specified vector number occurs. For the comparison, both 
operands are treated as unsigned integers. 

For programs In the User mode, a Protection Violation trap 

occurs — instead of the assert trap — if a vector number between and 

63 is specified. 
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ASLE 



Assert Less Than or Equal To 



Operation: IF SRCA <SRCB THEN Continue 
ELSE Trap (VN) 

Assembler 

Syntax: ASLE vn, ra, rb 
or 
ASLE vn, ra, constS 

Status: Not affected 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 



VN 



M = 1 : 1 (Zero-extended to 32 bits) 
Trap vector number 



ASLE 



31 


23 




15 




7 







1 II II II 

1 1 1 M 


1 1 1 1 1 1 1 

VN 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP = 54, 55 



ASLE 



Description: If the value of the SRCA operand is less than or equal to the value of 
the SRCB operand, instruction execution continues; othenA/lse, a trap 
with the specified vector number occurs. 

For programs In the User mode, a Protection Violation trap 
occurs— instead of the assert trap— if a vector number between and 
63 is specified. 
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ASLEU 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



ASLEU 



Assert Less Than or Equal To, Unsigned 

IF SRCA<SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 

ASLEU vn, ra, rb 

or 
ASLEU vn, ra, constS 

Not affected 

SRCA Content of register RA 

M = 0: Content of register RB 



SRCB 
VN 



M = 1:1 {Zero-extended to 32 bits) 
Trap vector number 



31 


23 




15 




7 







1 1 i 1 1 1 1 

1 1 1 1 M 


i 1 1 i 1 1 1 

VN 


i i 1 i 1 i 1 

RA 


1 1 1 1 1 1 

RBorl 



OP = 56, 57 



ASLEU 



Description: If the value of the SRCA operand is less than or equal to the value of 
the SRCB operand, instruction execution continues; othenwise, a trap 
with the specified vector number occurs. For the comparison, both 
operands are treated as unsigned integers. 

For programs in the User mode, a Protection Violation trap 

occurs — instead of the assert trap — if a vector number between and 

63 is specified. 
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ASLT 



Assert Less Than 



ASLT 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



IF SRCA<SRCB THEN Continue 
ELSE Trap(VN) 

ASLT vn, ra, rb 

or 
ASLT vn, ra, constS 

Not affected 

SRCA 

SRCB 



VN 



Content of register RA 

i\/I = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Trap vector number 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 M 


1 1 1 1 1 1 1 

VN 


1 1 1 1 1 i 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP = 50, 51 



ASLT 



Description: If the value of the SRCA operand is less than the value of the SRCB 
operand, instruction execution continues; othen/vise, a trap with the 
specified vector number occurs. 

For programs in the User mode, a Protection Violation trap 
occurs— instead of the assert trap— If a vector number between and 
63 is specified. 
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ASLTU 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



ASLTU 



Assert Less Than, Unsigned 

IF SRCA<SRCB (unsigned) THEN Continue 
ELSE Trap (VN) 

ASLTU vn, ra, rb 

or 
ASLTU vn, ra, constS 

Not affected 

SRCA Content of register RA 

M = 0: Content of register RB 



SRCB 
VN 



Trap vector number 



31 


23 




15 




7 







II 

1 1 1 M 


1 1 1 1 1 1 1 

VN 


1 1 1 1 i 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 52, 53 



ASLTU 



Description: If the value of the SRCA operand Is less than the value of the SRCB 
operand, instruction execution continues; othen^^ise, a trap with the 
specified vector number occurs. For the comparison, both operands 
are treated as unsigned integers. 

For programs in the User mode, a Protection Violation trap 

occurs — instead of the assert trap — if a vector number between and 

63 is specified. 
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ASNEQ 



ASNEQ 



Assert Not Equal To 



Operation: IF SRCA <> SRCB THEN Continue 
ELSE Trap (VN) 

Assembler 

Syntax: ASNEQ vn, ra, rb 
or 
ASNEQ vn, ra, constS 

Status: Not affected 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

VN Trap vector number 



31 


23 




15 




7 







1 II 1 1 II 

1 1 1 1 M 


1 1 1 1 1 1 1 

VN 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 72, 73 



ASNEQ 



Description: If the SRCA operand is not equal to the SRCB operand, instruction 
execution continues; othenA/lse, a trap with the specified vector 
nunnber occurs. 

For programs in the User mode, a Protection Violation trap 

occurs — instead of the assert trap — if a vector number between and 

63 is specified. 
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CALL 



CALL 



Call Subroutine 



Operation: 



Assembler 
Syntax: 

Status: 



DEST<-PC//00 = 8 
PC <~ TARGET 
Execute delay instruction 

CALL ra, target 
Not affected 



Operands: TARGET 



DEBT 



A=0: 117... 110// 19... 12 (sign-extended to 30 bits) + PC 
A= 1 : 117 ... II 0// 19 ... 12 (zero-extended to 30 bits) 

Register RA 



31 



I I i I i I I 

1 1 1 A 



23 



I I I I I I I 

117... 110 



15 



I I I I I I I 

RA 



I I I 1 I I I 
19 ..12 



OP=A8. A9 



CALL 



Description: The address of the second following Instruction is placed into the 
DEBT location, and a non-sequential instruction fetch occurs to the 
instruction address given by the TARGET operand. The instruction 
following the CALL is executed before the non-sequential fetch 
occurs. 
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CALLI 



CALLI 



Call Subroutine, Indirect 



Operation: 


DESTf-PC//00 + 8 

PC^SRCB 

Execute delay instruction 










Assembler 
Syntax: 


CALLI ra, rb 








Status: 


Not affected 








Operands: 


SRCB Content of register RB 
DEBT Register RA 








31 


23 15 


7 







Mill 

110 1 


1 1 




1 1 1 1 1 1 1 

Reserved 


1 1 1 1 1 1 1 

RA 


1 i 1 1 1 1 1 

RB 



OP = C8 



CALLI 



Description: The address of the second following Instruction is placed Into the 
DEST location, and a non-sequential instruction fetch occurs to the 
instruction address given by the SRCB operand. The instruction 
following the CALLI is executed before the non-sequential fetch 
occurs. 
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CLASS 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



CLASS 



Control: 



Classify Floating-Point Operand 

DEST<-CLASS(SRCA) 

CLASS re, ra, FS 

None 

SRCA Content of register RA (single-precision f.p.) 

or 
Content of register RA and tlie twin of register RA 
(Double-precision f.p.) 

DEBT Register RC 

FS Format of source operand SRCA 

00 Reserved for future use 

01 Single-precision floating-point 

10 Double-precision floating-point 

1 1 Reserved for future use 



31 


23 




15 




7 





1 1 II II 1 

1110 110 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 

Reserved 


1 

FS 



OP = E6 



CLASS 



Description: A 32-bit classification code for operand SRCA is placed into the 
DEST location. Operand SRCA is a single- or double-precision 
operand, as specified by FS. The classification code has the following 
format: 



31 




7 









1 i 1 1 1 1 1 II II II II 1 II 1 II 1 II 1 






s 


1 1 1 1 

EFC 



Bits 31-6: Reserved (forced to 0). 

Bit 5: Operand Sign (OS). The OS bit is 1 for a negative operand 
(Including negative zero) and for a non-negative operand. 

Bits 4-0: Exponent-Fraction Class (EFC). This field classifies the 
biased exponent and fraction fields of the source operand as follows 
(Max Is the largest biased exponent that can be used to represent a 
finite number. This exponent Is 254 for the single-precision format 
and 2,046 for the double-precision format). 
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EFC 


Biased Exp (bexp) 


Fraction (frac) 


Comments 


00000 
00001 
00010 
00011 










0<frac<.111...1 
.111...1 


zero 
unused 
denormalized 
denormalized 


00100 
00101 
00110 
00111 


1 

1 
1 


.111.. .1 




unused 

0<frac<.111...1 


01000 
01001 
01010 
01011 


1 < bexp < Max 

1 < bexp < Max 
1 < bexp < Max 




0<frac<.111...1 
.111...1 


unused 


01100 
01101 
01110 
01111 


Max 

Max 
Max 




0<frac<.111 ...1 
.111.. .1 


unused 


10000 
10001 
10010 
10011 


Max + 1 

Max + 1,fracMSB = 
Max + 1. frac MSB =1 




<>0 
<>0 


infinity 
unused 
SNaN 
QNaN 



Note: Max is the largest biased exponent that can be used to 
represent a finite number in a given format. l\/lax is 254 for 
single-precision and 2,046 for double-precision. 

Executing the CLASS instruction causes a pipeline hold of one cycle, 
until the intermediate result enters the denormalizer of the 
Floating-Point Unit. 



INSTRUCTION SET 8-31 



CLZ 



CLZ 



Count Leading Zeros 



Operation: Determine number of leading zeros in a word 






Assembler 

Syntax: Cl-Z re, rb 
or 
CLZ re, constB 






Status: Not affected 




Operands: SRCB l\/l=0: Content of register RB 

IVI = 1 : 1 (Zero-extended to 32 bits) 




DEBT Register RC 




31 23 15 7 





1 1 1 i 1 1 1 

1 M 


1 1 1 1 i 1 1 

RC 


1 1 1 i i 1 1 

Reserved 


1 1 1 1 1 1 1 

RBorl 


OP -08,09 CLZ 




Description: A count of tlie number of zero-bits to the first one-bit 


in the SRCB 



operand is placed into the DEBT location. If the most-significant bit of 
the SRCB operand is 1, the resulting count is zero. If the SRCB 
operand is zero, the resulting count is 32. 



8-32 INSTRUCTION SET 



CONST 



CONST 



Constant 



Operation: DEST <- 011 6 

Assembler 

Syntax: CONST ra, consti 6 

Status: Not affected 

Operands: 0116 115 ... 8//I7 ... 10 (Zero-extended to 32 bits) 

DEST Register RA 



31 


23 






15 




7 









1 1 1 1 1 1 1 

11 


1 


1 1 

115. 


1 1 1 

.18 


1 II II II 

RA 


1 1 


1 1 

17. 


1 1 

.10 


1 



OP = 03 



CONST 



Description: The 0116 operand is placed Into the DEST location. 



INSTRUCTION SET 8-33 



CONSTH 



CONSTH 



Constant, High 



Operation: Replace high-order half-word of SRCA by 11 6 

Assembler 

Syntax: CONSTH ra, consti 6 

Status: Not affected 

Operands: SRCA Content of register RA 

116 I15...I8//I7...I0 

DEST Register RA 



31 


23 






15 




7 






Q 


1 1 1 1 1 1 1 

10 


1 1 


1 1 

115. 


1 1 1 

.18 


1 II 1 1 1 1 

RA 


1 1 


1 1 

17. 


1 1 

.10 


1 



OP«02 



CONSTH 



Description: The low-order half-word of the SRCA operand Is appended to the 116 
operand, and the result Is placed Into the DEST operand. Note that 
the destination register for this Instruction is the same as the source 
register. 



8-34 INSTRUCTION SET 



CONSTHZ 



CONSTHZ 



Constant High, Zero Lower 



Operation: DEST<-I16«16 

Assembler 

Syntax: CONSTHZ ra, consti 6 

Status: Not affected 

Operands: 116 I15...I8//I7...I0 

DEST Register RA 



31 


23 






15 




7 







1 1 1 1 1 1 1 

10 1 


1 1 


1 1 

115. 


1 1 1 

.18 


1 1 i 1 1 1 1 

RA 


1 1 1 1 1 1 1 

17.. 10 



OP =05 



CONSTHZ 



Description: The 116 operand Is placed Into the upper 16 bits of the DEST 

location; the lower 16 bits of the DEST location are replaced with 
zeros. 



INSTRUCTION SET 8-35 



CONSTN 



CONSTN 



Constant, Negative 



Operation: DEST^1I16 

Assembler 

Syntax: CONSTN ra, consti 6 

Status: Not affected 

Operands: 1 11 6 115 ... 18// 17 ... 10 (ones-extended to 32 bits) 

DEBT Register RA 



31 


23 






15 




7 







II 1 II II 

1 


1 


1 1 

115. 


1 1 1 

.18 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

17 ...10 



OP = 01 



CONSTN 



Description: The 1116 operand is placed into the DEBT location. 



8-36 INSTRUCTION SET 



CONVERT 



CONVERT 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



Convert Data Format 

DEST4-SRCA, with format modified per Ul, RND, FD, FS 

CONVERT re, ra, Ul, RND, FD, FS 

fpX, fpU, fpV, fpR, fpN 

SRCA Content of register RA (single-precision f.p.) 

or 
Content of register RA and the twin of register RA 
(Double-precision f.p.) 

DEST Content of register RC (single-precision f.p.) 

or 
Content of register RC and the twin of register RA 
(Double-precision f.p.) 

Control: Ul = signed integer 

1 = unsigned integer 



RND 


Round mode 


000 


Round to nearest 


001 


Round to minus infinity 


010 


Round to plus infinity 


oil 


Round to zero 


100 


Round using f.p. round mode (FRM) 


101-111 


Reserved 


FS,FD 


Format of source operand, format of destination 




operand 


00 


Integer 


01 


Single-precision floating-point 


10 


Double-precision floating-point 


11 


Reserved 



31 


23 




15 




7 






C 


1 1 1 1 1 1 1 

1110 10 


1 II II II 
RC 


1 1 1 1 1 1 1 

RA 


U 

1 


1 1 

RND 


1 

FD 


1 

FS 



0P = E4 



CONVERT 



Description: The SRCA operand with format FS Is converted to format FD and 

rounded according to RND, then placed into the DEST location. If the 
source or destination operand is an integer, it is a signed or unsigned 
value according to the value of Ul. 

Note: Converting from format to like format is not supported, and will 
produce unpredictable results. 



INSTRUCTION SET 8^7 



CPBYTE 



Compare Bytes 



CPBYTE 



Operation: IF (SRCA.BYTEO = SRCB.BYTEO) OR 
(SRCA.BYTE1 =SRCB.BYTE1) OR 
(SRCA.BYTE2 = SRCB.BYTE2) OR 
(SRCA.BYTE3 = SRCB.BYTE3) THEN 
DEST4-TRUE ELSE DEST<- FALSE 



Assembler 
Syntax: 


CPBYTE re. 

or 
CPBYTE re, 


ra, rb 
ra, eonstS 








Status: 


Not affeeted 






Operands: 


SRCA 


Content of register RA 






SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEST 


Register RC 




31 


23 


15 7 





Mill 

10 1 


1 1 

1 1 M 


1 1 1 1 1 II 
RC 


II 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP=2E.2F 



CPBYTE 



Description: Eaeh byte of tiie SRCA operand is eompared to the eorresponding 
byte of the SRCB operand. If any eorresponding bytes are equal, a 
Boolean TRUE is plaeed into the DEST loeation; othen^/ise, a 
Boolean FALSE Is plaeed into the DEST loeation. 



8-38 INSTRUCTION SET 



CPEQ 



Compare Equal To 



CPEQ 



Operation: IF SRCA = SRCB THEN DEST^TRUE 
ELSE DEST^ FALSE 

Assembler 

Syntax: CPEQ re, ra, rb 
or 
CPEQ re, ra, eonstS 

Status: Not affeeted 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 



DEST 



M = 1 : 1 (Zero-extended to 32 bits) 
Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 M 


II 

RC 


1 II II II 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 60, 61 



CPEQ 



Description: If the SRCA operand is equal to the SRCB operand, a Boolean TRUE 
Is plaeed into the DEST location; otherwise, a Boolean FALSE is 
plaeed into the DEST loeatlon. 



INSTRUCTION SET 8-39 



CPGE 



CPGE 



Compare Greater Than or Equal To 



Operation: IFSRCA> SRCBTHEN DEST<~TRUE 
ELSE DEST«- FALSE 

Assembler 

Syntax: CPGE re, ra, rb 
or 
CPGE re, ra, eonstS 

Status: Not affeeted 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



31 



23 



15 



I I I i I I i 

1 1 1 M 



I I I I I i I 

RC 



I I I I I I 

RA 



I I I I I I I 

RBorl 



OP = 4C,4D 



CPGE 



Description: If tlie value of the SRCA operand is greater than or equal to the value 
of the SRCB operand, a Boolean TRUE is placed into the DEST 
location; othenrt/ise, a Boolean FALSE is placed into the DEST 
location. 



8-40 INSTRUCTION SET 



CPGEU 



Compare Greater Than or Equal To, Unsigned 



CPGEU 



Operation: IF SRCA > SRCB (unsigned) THEN DEST^TRUE 
ELSE DEST<- FALSE 

Assembler 

Syntax: CPGEU re, ra, rb 
or 
CPGEU re, ra, eonstS 

Status: Not affeeted 

Operands: SRCA 

SRCB 



DEST 



Content of register RA 

IVI = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 



23 



15 



I I I I I 1 I 

1 1 1 1 M 



I I 1 I i I I 

RC 



I I I I i I I 

RA 



I I I I I I I 

RB or I 



OP = 4E, 4F 



CPGEU 



Description: If the value of the SRCA operand is greater than or equal to the value 
of the SRCB operand, a Boolean TRUE is placed into the DEST 
loeation; othen^^ise, a Boolean FALSE is placed into the DEST 
location. For the comparison, both operands are treated as unsigned 
integers. 



INSTRUCTION SET 8-41 



CPGT 



Compare Greater Than 



CPGT 



Operation: IF SRCA > SRCB THEN DEST <- TRUE 
ELSE DEST <- FALSE 



Assembler 
Syntax: 


CPGT re, 

or 
CPGT re. 


ra, 
ra, 


rb 
eonstS 








Status: 


Not affeeted 






Operands: 


SRCA 
SRCB 

DEST 




Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 




31 


23 




15 7 





Mill 

10 1 


1 1 

M 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 48, 49 



CPGT 



Description: If the value of the SRCA operand is greater than the value of the 
SRCB operand, a Boolean TRUE is plaeed into the DEST loeation; 
othenA^lse, a Boolean FALSE is plaeed into the DEST loeation. 



8-42 INSTRUCTION SET 



CPGTU 



CPGTU 



Compare Greater Than, Unsigned 



Operation: IF SRCA>SRCB (unsigned) THEN DEST<-TRUE 
ELSE DEST<- FALSE 

Assembler 

Syntax: CPGTU re. ra, rb 
or 
CPGTU re, ra, eonstS 

Not affeeted 



Status: 
Operands: 



SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



31 


23 




15 




7 





1 II II II 

1 1 1 M 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP = 4A,4B 



CPGTU 



Description: If the value of the SRCA operand is greater than the value of the 
SRCB operand, a Boolean TRUE is plaeed into the DEST location; 
othen/vise, a Boolean FALSE is plaeed into the DEST loeation. For 
the eomparison, both operands are treated as unsigned integers. 



INSTRUCTION SET 8-43 



CPLE 



Operation: 



CPLE 



Compare Less Than or Equal To 



IF SRCA<SRCB THEN DEST< 
ELSE DEST<~ FALSE 



-TRUE 



Assembler 
Syntax: 


CPLE re, 

or 
CPLE re. 


ra. 


rb 




ra, 


eonstS 


Status: 


Not affeeted 




Operands: 


SRCA 




Content of register RA 




SRCB 




M = 0: Content of register RB 
M = 1 : r(Zero-extended to 32 bits) 




DEST 




Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 M 


i 1 1 1 i 1 i 
RC 


1 1 1 1 1 i 1 

RA 


1 1 1 1 i 1 1 

RB or 1 



OP = 44, 45 



CPLE 



Description: If the value of the SRCA operand Is less than or equal to the value of 
the SRCB operand, a Boolean TRUE is plaeed into the DEST 
loeation; othenrt/ise, a Boolean FALSE Is placed into the DEST 
location. 



8-44 INSTRUCTION SET 



CPLEU 



CPLEU 



Compare Less Than or Equal To, Unsigned 



Operation: IF SRCA <SRCB (unsigned) THEN DEST^TRUE 
ELSE DEST^ FALSE 

Assembler 

Syntax: CPLEU re, ra, rb 
or 
CPLEU re, ra, eonstS 

Status: Not affected 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



31 


23 




15 




7 







1 1 1 i 1 1 1 

1 1 1 M 


1 1 1 1 1 i 1 
RC 


1 1 1 1 1 i 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP = 46, 47 



CPLEU 



Description: If the value of the SRCA operand is less than or equal to the value of 
the SRCB operand, a Boolean TRUE is plaeed into the DEST 
loeation; othenA^ise, a Boolean FALSE is placed into the DEST 
location. For the comparison, both operands are treated as unsigned 
integers. 



INSTRUCTION SET 



CPLT 



Compare Less Than 



CPLT 



Operation: IF SRCA < SRCB THEN DEST <~ TRUE 
ELSE DEST <- FALSE 



Assembler 
Syntax: 


CPLl 

or 
CPLl 


'rc, 
"re, 


ra, 
ra, 


rb 
constS 










Status: 


Not affected 








Operands: 


SRCA 
SRCB 

DEST 




Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 






31 


23 




15 7 







1 1 1 1 i 

10 


i 1 

M 


1 1 1 i 1 1 1 

RC 


1 1 1 1 1 i 1 

RA 


1 i 1 1 1 1 1 

RBori 



OP = 40, 41 



CPLT 



Description: If the value of the SRCA operand is less than the value of the SRCB 
operand, a Boolean TRUE is placed Into the DEST location; 
othenrt/ise, a Boolean FALSE is placed into the DEST location. 



8-46 INSTRUCTION SET 



CPLTU 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



CPLTU 



Compare Less Than, Unsigned 

IF SRCA<SRCB (unsigned) THEN DEST^TRUE 
ELSE DEST<- FALSE 



CPLTU re, ra, rb 

or 
CPLTU re, ra, eonstS 

Not affeeted 

SRCA 

SRCB 



DEST 



Content of register RA 

l\/l = 0: Content of register RB 
l\/l = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 


C 


1 1 1 1 1 1 1 

1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 t 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP =-42, 43 



CPLTU 



Description: If the value of the SRCA operand is less than the value of the SRCB 
operand, a Boolean TRUE is plaeed into the DEST loeation; 
othenrt^ise, a Boolean FALSE is placed into the DEST location. For 
the comparison, both operands are treated as unsigned integers. 



INSTRUCTION SET 8-47 



CPNEQ 



Compare Not Equal To 



CPNEQ 



Operation: 


IFSRCAoSRCBTHEN 
ELSE DEST^ FALSE 


DEST f- TRUE 






Assembler 
Syntax: 


CPNEQ re. ra 

or 
CPNEQ re, ra 


.rb 






, eonstS 




Status: 


Not affeeted 






Operands: 


SRCA 


Content of register RA 






SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEST 


Register RC 




31 


23 


15 7 





Mill 

110 


1 1 

1 M 


i 1 1 i i 1 i 
RC 


1 1 1 1 1 i i 

RA 


i i i 1 1 1 i 

RBorl 



OP » 62. 63 



CPNEQ 



Description: If the SRCA operand is not equal to the SRCB operand, a Boolean 
TRUE is placed into the DEST location; othenA/ise, a Boolean FALSE 
is placed into the DEST location. 



8-48 INSTRUCTION SET 



DADD 



DADD 



Floating-Point Add, Double-Precision 



Operation: DEST (double-precision) < 



-SRCA (double-precision) + 
SRCB (double-precision) 



Assembler 
Syntax: 

Status: 

Operands: 



DADD re, ra, rb 

fpX, fpU. fpV. fpR. fpN 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC and the twin of register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

11110 1 


1 II 1 1 II 
RC 


1 i 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP-F1 



DADD 



Description: The SRCA operand is added to the SRCB operand; the result is 
rounded according to FRM field of the Floating-Point Environment 
Register and placed into the DEST location. The operands and result 
of the addition are double-precision floating-point numbers. 



INSTRUCTION SET 8-49 



DDIV 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



DDIV 



Floating-Point Divide, Double-Precision 

DEST (double-precision) «-SRCA (double-precision) / 
SRCB (double-precision) 



DDIV re, ra, rb 

fpD.fpX.fpU,fpV,fpR,.fpN 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC and the twin of register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

11110 111 


1 1 II 1 il 

RC 


1 II 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP = F7 



DDIV 



Description: The SRCA operand is divided by the SRCB operand; the result is 
rounded according to FRM field of the Floating-Point Environment 
Register and placed into the DEST location. The operands and result 
of the division are double-precision floating-point numbers. 



8-50 INSTRUCTION SET 



DEQ 

Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



DEQ 



Floating-Point Equal To, Double-Precision 

IF SRCA (double-precision) = SRCB (double-precision) 
THEN DEST^TRUE 
ELSE DEST^ FALSE 

DEQ re, ra, rb 

fpl 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1110 10 11 


1 II II II 

RC 


1 1 1 i 1 II 

RA 


1 1 1 1 1 1 1 

RB 



OP = EB 



DEQ 



Description: If the SRCA operand is equal to the SRCB operand, a Boolean TRUE 
is placed into the DEST location; othen^/ise, a Boolean FALSE is 
placed into the DEST location. SRCA and SRCB are double-precision 
floating-point numbers. 

Note: The rounding mode specified by the FRM field of the 
Floating-Point Environment Register has no effect on this operation. 



INSTRUCTION SET 8-51 



DGE DGE 

Floating-Point Greater Than Or Equal To, Double-Precision 

Operation: IF SRCA (double-precision) >SRCB (double-precision) 
THEN DEST^TRUE 
ELSE DEST<- FALSE 

Assembler 

Syntax: DGE re, ra, rb 

Status: fpl 

Operands: SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1110 1111 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 II 1 II 1 

RB 



OP = EF 



DGE 



Description: If the SRCA operand Is greater than or equal to the SRCB operand, a 
Boolean TRUE Is placed Into the DEST location; othen/vlse, a 
Boolean FALSE Is placed Into the DEST location. SRCA and SRCB 
are double-precision floating-point numbers. 

Note: The rounding mode specified by the FRM field of the 
Floating-Point Environment Register has no effect on this operation. 



8-52 INSTRUCTION SET 



DGT 



DGT 



Floating-Point Greater Than, Double-Precision 



Operation: IF SRCA (double-precision) > SRCB (double-precision) 
THEN DEST<-TRUE 
ELSE DEST<- FALSE 

Assembler 

Syntax: DGT re, ra, rb 

Status: fpl 

Operands: SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC 



31 


23 




15 




7 





II 1 1 1 II 

1110 110 1 


1 1 1 i 1 1 1 
RC 


II 1 1 1 1 1 

RA 


1 1 1 1 i 1 1 

RB 



OP = ED 



DGT 



Description: If the SRCA operand is greater than the SRCB operand, a Boolean 
TRUE is placed into the DEST location; otherwise, a Boolean FALSE 
Is placed into the DEST location. SRCA and SRCB are 
double-precision floating-point numbers. 

Note: The rounding mode specified by the FRM field of the 
Floating-Point Environment Register has no effect on this operation. 



INSTRUCTION SET 8-53 



DIV 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



DIV 



Divide Step 

Perform one-bit step of a divide operation (unsigned) 

DIV re, ra, rb 
or 
DIV re, ra, const 8 

V.N,Z,C 

SRCA 

SRCB 



Content of register RA 
M=0: Content of register RB 



DEBT 



M = 1 : I (Zero-extended to 32 bits) 
Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 1 M 


1 1 1 1 1 II 

RC 


1 1 1 II 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



0P = 6A. 6B 



DIV 



Description: If the Divide Flag (DF) bit of the ALU Status Register is 1 , the SRCB 
operand is subtracted from the SRCA operand. If the DF bit is 0, the 
SRCB operand is added to the SRCA operand. 

The carry-out of the add or subtract operation is exclusive-ORed with 
the value of the DF bit and the value of the Negative (N) bit of the 
ALU Status Register; the resulting value is complemented and placed 
into the DF bit. The sign of the result of the add or subtract is placed 
into the N bit. 

The content of the Q Register Is appended to the result of the add or 
subtract, and the resulting 64-bit value is shifted left by one bit 
position; the value computed for the DF bit above fills the vacated bit 
position. The high-order 32 bits of the 64-bit shifted value are placed 
into the DEST location. The low-order 32 bits of the shifted value are 
placed into the Q Register. 

Examples of integer divide operations appear in Section 7.2.6. 



8-54 INSTRUCTION SET 



DIVO 



DIVO 



Divide Initialize 



Operation: Initialize for a sequence of divide steps (unsigned) 

Assembler 

Syntax: DIVO re, rb 
or 
DIVO re, eonstS 

Status: V, N, Z. C 

Operands: SRCB M = 0: Content of register RB 

M = 1 : I (Zero-extended to 32 bits) 

DEST Register RC 



31 



II II I II 

1 1 1 M 



23 



I I I I I I I 

RC 



15 



I I I I I I I 

Reserved 



I I I I I I I 



RBorl 



OP = 68, 69 



DIVO 



Description: The Divide Flag (DF) bit of the ALU Status Register is set. The sign of 
the SRCB operand is plaeed into the Negative bit of the ALU Status 
Register. 

The eontent of the Q register is appended to the SRCB operand, and 
the resulting 64-bit value is shifted left by one bit position; a fills the 
vaeated bit position. The high-order 32 bits of the 64-bit shifted value 
are plaeed into the DEST loeation. The low-order 32 bits of the shifted 
value are placed into the Q Register. 

Examples of integer divide operations appear in Seetion 7.2.6. 



INSTRUCTION SET 8-55 



DIVIDE 



DIVIDE 



Integer Divide, Signed 



Operation: 


DEBT ^ 
Q <" 


(Q//BRCA)/BRCB (sign 
Remainder 


Assembler 
Syntax: 


DIVIDE re, ra 


,rb 


Status: 


Not affected 




Operands: 


Q 


Content of the Q Register 




SRCA 


Content of register RA 




SRCB 


Content of register RB 




DEBT 


Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1110 1 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 i 1 1 1 1 1 

RB 



OP = E1 



DIVIDE 



Description: The BRCA operand is appended to the content of the Q register. The 
resulting 64-bit value is divided by the BRCB operand, and the result 
is placed into the DEBT location. This operation treats the operands 
as signed two's-complement integers and produces a signed 
two's-complement result. 

The remainder is placed into the Q register. A non-zero remainder 
always has the same sign as the dividend. 

This instruction does not check for a divide overflow condition. 
Checking for divide overflow must occur before the instruction is 
executed. 

Note: This instruction is not supported directly in processor hardware. 
In the Am29050 microprocessor, this instruction causes a DIVIDE 
trap. When the trap occurs, the IPA, IPB, and IPC registers are set to 
reference BRCA, BRCB, and DEBT. 
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DIVIDU 



Integer Divide, Unsigned 



DIVIDU 



Operation: DEST «-(Q// SRCA) /SRCB (unsigned) 
Q <r- Remainder 



Assembler 
Syntax: 


DIVIDU re, ra 


,rb 








Status: 


Not affected 








Operands: 


Q 

SRCA 
SRCB 
DEST 


Content of the Q Register 
Content of register RA 
Content of register RB 
Register RC 






31 


23 


15 


7 





Mill 

1110 


1 1 

1 1 


1 1 1 1 1 1 1 

RC 


1 i 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RB 



OP = E3 



DIVIDU 



Description: The SRCA operand is appended to the content of the Q Register. The 
resulting 64-bit value is divided by the SRCB operand, and the result 
is placed into the DEST location. This operation treats the operands 
as unsigned integers, and produces an unsigned result. 

The remainder Is placed into the Q Register. The remainder is also 
unsigned. 

Note: This instruction is not supported directly in processor hardware. 
In the Am29050 microprocessor, this instruction causes a DIVIDU 
trap. When the trap occurs, the IPA, IPB, and IPC registers are set to 
reference SRCA, SRCB, and DEST. 
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DIVL 



Divide Last Step 



Operation: Complete a sequence of divide steps (unsigned) 

Assembler 

Syntax: DIVL re, ra, rb 

Status: V, N. Z, C 

Content of register RA 



Operands: SRCA 
SRCB 

DEST 



M = 0: Content of register RB 
M = 1 : I (Zero-extended to 32 bits) 

Register RC 



OP-6C, 6D 



DIVL 



DIVL 



31 


23 




15 




7 







1 1 i 1 1 1 1 

1 1 1 1 M 


1 II 1 1 1 1 

RC 


1 i 1 1 1 1 1 

RA 


1 1 1 1 i 1 1 

RBorl 



Description: If the Divide Flag (DF) bit of the ALU Status Register is 1 , the SRCB 
operand is subtracted from the SRCA operand. If the DF bit is 0, the 
SRCB operand is added to the SRCA operand. The result is placed 
Into the DEST location. 

The carry-out of the add or subtract operation is exclusive-ORed with 
the value of the DF bit and the value of the Negative (N) bit of the 
ALU Status Register; the resulting value is complemented and placed 
Into the DF bit. The sign of the result of the add or subtract is placed 
into the N bit. 

The content of the Q register is shifted left by one bit position; the 
value computed for the DF bit above fills the vacated bit position. The 
shifted value is placed into the Q Register. 

Examples of integer divide operations appear in Section 7.2.6. 
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Divide Remainder 


Operation: 


Generate remainder for divide operation (unsigne 


Assembler 
Syntax: 


DIVRE1\/1 re. 
or 


ra, rb 




DiVREiVI re, 


ra, constS 


Status: 


V, N, Z, C 




Operands; 


SRCA 


Content of register RA 




SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 




DEST 


Register RC 



DIVREIVI 



31 



I I I I I I I 

1 1 1 1 1 M 



23 



I I I I I I I 

RC 



15 



Mill 

RA 



I I I I I I I 

RBori 



0P-6E, 6F 



DIVREM 



Description: If the Divide Flag (DP) bit of the ALU Status Register is 1 , the SRCA 
operand is placed into the DEST loeation. 

If the DF bit is 0, the SRCB operand is added to the SRCA operand, 
and the result is plaeed Into the DEST loeation. 

Examples of Integer divide operations appear in Seetion 7.2.6. 
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DMAC 



DMAC 



Floating-Point Multiply-Accumulate, Double-Precision 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



Control: 



ACC(ACN) (double-precision) f-SRCA (double-precision) *SRCB 
(double-precision) = ACC(ACN) (double-precision) 

DMAC FUNC,ACN,ra,rb 

fpU. fpV, fpR, fpN 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

ACC(ACN) (Content of) Accumulator register ACN 

FUNC Modifies operation as shown in the table below 

ACN Accumulator register number (0, 1 , 2, 3) 



31 


23 






15 




7 







1 f 1 1 1 i 1 

110 110 1 


Res 


1 1 i 

FUNC 


a' 
c 

N 


1 II II II 
RA 


i 1 i 1 1 i 1 

RB 



OP = D9 



DMAC 



Description: A compound operation of the form (0P1 * 0P2) = 0P3 is performed, 
where 0P1 , 0P2, and OPS are double-precision operands. Operand 
sources and optional sign changes are specified by FUNC, as 
described in the table below. The result is rounded and stored in 
ACC(ACN), in double^precision format. The Accumulator Format 
(ACF) field of the Floating-Point Environment Register must specify 
double-precision. 

Note that the DMAC instruction uses the fast float mode of operation, 
regardless of the state of the Fast Float Select bit in the 
Floating-Point Environment Register. The DMAC instruction never 
causes a Floating-Point Exception trap— it updates the sticky status 
bits Instead. Furthermore, the DMAC Instruction never sets the 
Inexact Sticky bit, regardless of the result. 
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FUNC Operation Performed 



0000 


(SRCA*SRCB) 


+ 


ACC (ACN) 


0001 


(SRCA*-SRCB) 


+ 


ACC(ACN) 


0010 


(SRCA*SRCB) 


- 


ACC (ACN) 


0011 


(SRCA*-SRCB) 


- 


ACC(ACN) 


0100 


(SRCA*SRCB) 


+ 


0.0 


0101 


(SRCA*-SRCB) 


+ 


0.0 


0110 


(SRCA*SRCB) 


- 


0.0 


0111 


(SRCA*-SRCB) 


- 


0.0 


1000 


(SRCAM.O) 


+ 


ACC(ACN) 


1001 


(SRCA*-1.0) 


+ 


ACC(ACN) 


1010 


(SRCAM.O) 


- 


ACC(ACN) 


1011 


(SRCA*-1.0) 


- 


ACC(ACN) 



1100 


(SRCAM.O) 


1101 


(SRCAM.O) 


1110 


(SRCA*1.0) 


1111 


(SRCAM.O) 



0.0 
0.0 
0.0 
0.0 
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DMSM 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



DMSM 



Floating-Point Multiply-Sum, Double-Precision 

DEBT (double-precision) <- SRCA (double-precision) * ACC(O) 
(double-precision) = SCRB (double-precision) 

DMSM rcra.rb 

fpU, fpV, fpR, fpN 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

ACC(O) (Content of) Accumulator register 

DEBT Register RC and the twin of register RC 



31 


23 




15 




7 


( 


1 1 1 1 1 1 i 

110 110 11 


1 1 1 1 i II 

RC 


1 1 1 1 1 1 1 

RA 


M 1 II II 

RB 



OP = DB 



DMSM 



Description: The BRCA operand Is multiplied by the ACC(O) operand, and the 
product added to the BRCB operand; the result Is rounded to 
double-precision format according to Floating-Point Environment 
Register field FRM, and placed Into the DEBT location. Operands 
SRCA, BRCB, and ACC(O) are double-precision floating-point 
numbers. The Accumulator Format field of the Floating-Point 
Environment Register must specify double-precision. 

Note that the DMSM Instruction uses the fast float mode of operation, 
regardless of the state of the Fast Float Select bit In the Floating- 
Point Environment Register. The DMSM Instruction never causes 
a Floating-Point Exception trap— it updates the sticky status bits 
instead. Furthermore, the DMSM instruction never sets the Inexact 
Sticky bit, regardless of the result. 
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DMUL 



Operation: 

Assembler 
Syntax: 

Status: 

OpiBrands: 



DMUL 



Floating-Point Multiply, Double-Precision 

DEBT (double-precision) ^SRCA (double-precision)* 
SRCB (double-precision) 

DMUL re, ra, rb 

fpX. fpU, fpV, fpR, fpN 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC 



31 


23 




15 




7 





II II II 1 

11110 10 1 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP = F5 



DIVIUL 



Description: The SRCB operand is multiplied by the SRCA operand; the result is 
rounded according to FRM field of the Floating-Point Environment 
Register and placed into the DEST location. The operands and result 
of the multiplication are double-precision floating-point numbers. 
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DSUB 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



DSUB 



Floating-Point Subtract, Double-Precision 

DEBT (double-precision) -f-SRCA (double-precision) 
SRCB (double-precision) 

DSUB re, ra, rb 

fpX, fpU. fpV, fpR. fpN 

SRCA Content of register RA and the twin of register RA 

SRCB Content of register RB and the twin of register RB 

DEST Register RC 



31 


23 




15 




7 







II II II 1 

11110 11 


1 1 1 1 1 1 1 

RC 


i 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP = F3 



DSUB 



Description: The SRCB operand is subtracted from the SRCA operand; the result 
is rounded according to FRM field of the Floating-Point Environment 
Register and placed into the DEST location. The operands and result 
of the subtraction are double-precision floating-point numbers. 
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EMULATE 



EMULATE 



Trap to Software Emulation Routine 



Operation: Load IPA and IPB registers with operand register-numbers 
and Trap (VN) 

Assembler 

Syntax: EIVIULATE vn, ra, rb 

Status: Not affected 

Operands: Absolute-register numbers for registers RA and RB 

VN Trap vector number 



31 


23 




15 




7 







1 1 1 1 1 1 1 

110 10 111 


1 1 1 1 1 1 1 

VN 


1 1 1 1 1 1 1 

RA 


1 1 1 1 i 1 1 
RB 



OP=D7 



EMULATE 



Description: Tlie IPA and IPB registers are set to the register numbers of registers 
RA and RB, respectively. A trap with the specified vector number 
occurs. 

Note that the IPC register also is affected by this Instruction, but that 
its value has no interpretation. 

For programs In the User mode, a Protection Violation trap occurs— 
Instead of the EMULATE trap— if a vector number between and 63 
is specified. 
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EXBYTE 



EXBYTE 



Extract Byte 



Operation: DEST«-SRCB, with low-order byte replaced by byte In 
SRCA selected by BP 



Assembler 
Syntax: 


EXBYTE re, 

or 
EXBYTE re. 


ra.rb 










ra, constS 




Status: 


Not affected 






Operands: 


SRCA 


Content of register RA 






SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEBT 


Register RC 




31 


23 


15 7 





1 1 1 1 1 

1 


1 1 

1 M 


1 II II II 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 0A, OB 



EXBYTE 



Description: A byte In the SRCA operand Is selected by the Byte Position field of 
the ALU Status Register and the Byte Order bit of the Configuration 
Register. The selected byte replaces the low-order byte of the SRCB 
operand and the resulting word Is placed Into the DEST location. 

Note: The selection of bytes within words Is specified in Section 3.4.5. 
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EXHW 



EXHW 



Extract Half-word 



Operation: DEST<- SRCB, with low-order half-word replaced by half-word in 
SRCA selected by BP 

Assembler 

Syntax: EXHW re, ra, rb 
or 
EXHW re, ra, constS 

Status: Not affected 



Operands: SRCA 
SRCB 

DEST 



Content of register RA 

M = 0: Content of register RB 

M = 1 : I (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 







1 1 1 1 1 II 

1 1 1 1 1 M 


II 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 i 1 1 

RBori 



0P = 7C. 7D 



EXHW 



Description: A half-word in the SRCA operand is selected by the Byte Position 
field of the ALU Status Register and the Byte Order bit of the 
Configuration Register. The selected half-word replaces the 
low-order half-word of the SRCB operand, and the resulting word is 
placed into the DEST location. 

Note: The selection of half-words within words is specified in 
Section 3.4.5. 
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EXHWS 



EXHWS 



Extract Half-Word, Sign-Extended 



Operation: DEST <~ half-word in SRCA selected by BP, 
sign-extended to 32 bits 

Assembler 

Syntax: EXHWS re, ra 

Status: Not affected 

Operands: SRCA Content of register RA 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 II 1 

1111110 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

Reserved 



OP = 7E 



EXHWS 



Description: A half-word in the SRCA operand is selected by the Byte Position 
field of the ALU Status Register and the Byte Order bit of the 
Configuration Register. The selected half-word is sign-extended to 32 
bits, and the resulting word is placed into the DEST location. 

Note: The selection of half-words within words is specified in 
Section 3.4.5. 
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EXTRACT 



EXTRACT 



Extract Word, Bit-Aligned 



Operation: DEST<- high-order word of (SRCA//SRCB«FC) 

Assembler 

Syntax: EXTRACT re, ra ,rb 
or 
EXTRACT re, ra, eonstS 

Status: Not affeeted 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

l\/l = 1: I (Zero-extended to 32 bits) 

Register RC 



DEST 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 7A. 7B 



EXTRACT 



Description: The SRCB operand is appended to the SRCA operand, and the 
resulting 64-blt value is shifted left by the number of bit-positions 
speeified by the Funnel Shift Count (FC) field of the ALU Status 
register. The high-order 32 bits of the 64-blt shifted value are plaeed 
In the DEST loeation. 

If the SRCB operand is the same as the SRCA operand, the 
EXTRACT instruction performs a rotate operation. 
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FADD 



Floating-Point Add, Single-Precision 



Operation: DEST (single-precision) <- SRCA (single-precision) + 

SRCB (single-precision) 

Assembler 

Syntax: FADD re, ra, rb 

Status: fpX, fpU, fpV, fpR, fpN 

Operands: SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



FADD 



31 


23 




15 




7 







II II 1 1 1 

11110 


i 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP«F0 



FADD 



Description: The SRCA operand is added to the SRCB operand; the result Is 
rounded according to FRM field of the Floating-Point Environment 
Register and placed Into the DEST location. The operands and result 
of the addition are single-precision floating-point numbers. 
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FDIV 



Floating-Point Divide, Single-Precision 



Operation: DEST (single-precision)^ 



• SRCA (single-precision) / 
SRCB (single-precision) 



Assembler 
Syntax: 

Status: 

Operands: 



FDIV re, ra, rb 

fpD. fpX, fpU, fpV, fpR, fpN 

SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



FDIV 



31 


23 




15 




7 







1 1 1 1 1 1 1 

11110 110 


1 II II II 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 i 1 

RB 



OP = F6 FDIV 

Description: The SRCA operand is divided by the SRCB operand; the result is 
rounded according to FRM field of the Floating-Point Environment 
Register and placed Into the DEST location. The operands and result 
of the division are single-precision floating-point numbers. 
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FDMUL 



FDMUL 



Floating-Point Multiply, Single-to-Double Precision 



Operation: DEST (double-precision) < 



Assembler 
Syntax: 

Status: 

Operands: 



SRCA (single-precision) ' 
SRCB (single-precision) 



FDMUL re, ra, rb 

fpR, fpN 

SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



31 


23 




15 




7 







II 1 1 1 1 1 

111110 1 


1 1 1 1 i 1 1 
RC 


1 1 1 1 1 i 1 
RA 


1 1 1 1 1 1 i 

RB 



0P = F9 



FDMUL 



Description: The SRCB operand is multiplied by the SRCA operand; the result is 
placed into the DEST location. SRCA and SRCB are single-precision 
floating-point numbers; the result is produced in double-precision 
format. Because the product of two single-precision operands can 
always be represented exactly as a double-precision number, the 
FDMUL result does not depend on the FRM field of the Floating-Point 
Environment Register. 
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FEQ 

Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



FEQ 



Floating-Point Equal To, Single-Precision 

IF SRCA (single-precision) = SRCB (single-precision) 
THEN DEBT f- TRUE 
ELSE DEBT <~ FALSE 

FEQ re, ra, rb 

fpN 

SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



31 


23 




15 




7 







1 1 II 1 II 

1110 10 10 


1 1 i 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP = EA 



FEQ 



Description: If the SRCA operand is equal to the SRCB operand, a Boolean TRUE 
is placed into the DEST location; othen^/ise, a Boolean FALSE Is 
placed into the DEST location. SRCA and SRCB are single-precision 
floating-point numbers. 

Note: The rounding mode specified by the FRM field of the 
Floating-Point Environment Register has no effect on this operation. 
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FGE FGE 

Floating-Point Greater Than Or Equal To, Single-Precision 

Operation: IF SRCA (single-precision) >SRCB (single-precision) 
THEN DEST4-TRUE 
ELSE DEST<- FALSE 

Assembler 

Syntax: FGE re, ra, rb 

Status: fpN 

Operands: SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



31 


23 




15 




7 





1 1 1 1 1 1 1 

1110 10 10 


M II 1 II 
RC 


i 1 1 i 1 1 1 
RA 


M 1 1 1 1 1 

RB 



OP = EE 



FGE 



Description: If tfie SRCA operand is greater than or equal to the SRCB operand, a 
Boolean TRUE is placed into the DEST location; otherwise, a 
Boolean FALSE is placed Into the DEST location. SRCA and SRCB 
are single-precision floating-point numbers. 

Note: The rounding mode specified by the FRM field of the Floating- 
Point Environment Register has no effect on this operation. 
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FGT 

Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



FGT 



Floating-Point Greater Than, Single-Precision 

IF SRCA (single-precision) >SRCB (single-precision) 
THEN DEBT f- TRUE 
ELSE DEST<- FALSE 



FGT re, ra, rb 

fpN 

SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1110 110 


1 II II II 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP = EC 



FGT 



Description: If the SRCA operand is greater than the SRCB operand, a Boolean 
TRUE is placed into the DEST location; othenA^ise, a Boolean FALSE 
is placed into the DEST location. SRCA and SRCB are 
single-precision floating-point numbers. 

Note: The rounding mode specified by the FRM field of the Floating- 
Point Environment Register has no effect on this operation. 
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FMAC 



FMAC 



Floating-Point Multiply-Accumulate, Single-Precision 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



Control: 



31 



ACC(ACN) (variable-precision)^ SRCA (single-precision)* 
SRCB (single-precision) +ACG(ACN) (variable-precision) 

FMAC FUNC,ACN,ra,rb 

fpU.fpV.fpR.fpN 

SRCA Content of register RA 

SRCB Content of register RB 

ACC(ACN) (Content of) Accumulator register ACN 

FUNC Modifies operation as shown in the table below 

ACN Accumulator register number (0, 1 , 2, 3) 

23 15 



I I I I I I I 

110 110 



1^ 

Res 



I I I 
FUNG 



II II I II 

RA 



I I I I 

RB 



OP = D8 FMAC 

Description: A compound operation of the form (0P1 * 0P2) + OPS Is performed, 
where 0P1 and 0P2 are single-precision operands, and OPS is an 
operand having the format specified by the Accumulator Format field 
of the Floating-Point Environment Register. Operand sources and 
optional sign changes are specified by FUNC, as described in the 
table below. The result is rounded and stored in ACC(ACN), in the 
format specified by ACF. 

Note that the FMAC instruction uses the fast float mode of operation, 
regardless of the state of the Fast Float Select bit in the Floating- 
Point Environment Register. The FMAC instruction never causes a 
Floating-Point Exception trap— it updates the sticky status bits 
instead. Furthermore, the FMAC instruction never sets the Inexact 
Sticky bit, regardless of the result. 



FUNG 




Operation performed 




oooo 

0001 
0010 
0011 


(SRCA* SRCB) 
(SRCA * -SRCB) 
(SRCA * SRCB) 
(SRCA* -SRCB) 


+ 

+ 


ACC(ACN) 
ACC(ACN) 
ACC(ACN) 
ACC(ACN) 


0100 
0101 
0110 
0111 


(SRCA * SRCB) 
(SRCA * -SRCB) 
(SRCA* SRCB) 
(SRCA * -SRCB) 


+ 
+ 


0.0 
0.0 
0.0 
0.0 


1000 
1001 
1010 
1011 


(SRCA* 1.0) 
(SRCA* -1.0) 
(SRCA* 1.0) 
(SRCA* -1.0) 


+ 
+ 


ACC(ACN) 
ACC(ACN) 
ACC(ACN) 
ACC(ACN) 


1100 
1101 
1110 


(SRCA* 1.0) 
(SRCA* -1.0) 
(SRCA* 1.0) 


+ 
+ 


0.0 
0.0 
0.0 
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FMSM 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



FMSM 



Floating-Point Multiply-Sum, Single-Precision 

DEBT (single-precision) <-SRCA (single-precision)* 
ACC(O) (single-precision) + SRCB (single-precision) 

FIVISM re, ra, rb 

fpU, fpV, fpR, fpN 

SRCA Content of register RA 

SRCB Content of register RB 

ACC(O) Content of accumulator register 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

110 110 10 


1 1 II II 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



OP = DA FMSM 

Description: The SRCA operand Is multiplied by the ACC(O) operand, and the 
product added to the SRCB operand; the result is rounded to 
single-precision format according to Floating-Point Environment 
Register field FRM, and placed Into the DEST location. Operands 
SRCA, SRCB, and ACC{0) are single-precision floating-point 
numbers. The Accumulator Format field of the Floating-Point 
Environment Register must specify single-precision. 

Note that the FMSM instruction uses the fast-float mode of operation, 
regardless of the state of the Fast-Float Select bit In the Floating- 
Point Environment Register. The FMSM instruction never causes a 
Floating-Point Exception trap—it updates the sticky status bits 
Instead. Furthermore, the FMSM Instruction never sets the Inexact 
Sticky bit, regardless of the result. 
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FMUL 



Floating-Point Multiply, Single-Precision 



Operation: DEST (single-precision) < 



SRCA (single-precision)* 
SRCB (single-precision) 



Assembler 
Syntax: 

Status: 

Operands: 



FMUL re, ra, rb 

fpX,fpU,fpV.fpR,fpN 

SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



FMUL 



31 


23 




15 




7 







1 1 1 II 1 1 

11110 10 


1 II II II 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RB 



0P = F4 FMUL 

Description: The SRCA operand Is multiplied by the SRCB operand; the result is 
rounded according to FRM field of the Floating-Point Environment 
Register and placed into the DEST location. The operands and result 
of the multiplication are single-precision floating-point numbers. 
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FSUB 



FSUB 



Floating-Point Subtract, Single-Precision 



Operation: DEST (single-precision) 



- SRCA (single-precision)* 
SRCB (single-precision) 



Assembler 
Syntax: 

Status: 

Operands: 



FSUB re, ra, rb 

fpX, fpU. fpV, fpR, fpN 

SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 1 II 

11110 10 


II II 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RB 



OP = F2 



FSUB 



Description: The SRCB operand is subtracted from the SRCA operand; the result 
is rounded according to FRIVI field of the Floating-Point Environment 
Register and placed into the DEST location. The operands and result 
of the subtraction are single-precision floating-point numbers. 



INSTRUCTION SET 8-79 



HALT 



Enter Halt Mode 



HALT 



Operation: Enter Halt mode on next cycle 

Assembler 

Syntax: HALT 

Status: Not affected 

Operands: Not applicable 



31 



II I II II 

10 1 1 



23 



I I II I I I 

Reserved 



15 



I I I I I I I 

Reserved 



I I I I I I I 

Reserved 



OP = 89 



HALT 



Description: The processor is placed into the Halt mode on the next cycle, except 
that any external data accesses are completed. 

This instruction may be executed only by Supervisor-mode programs. 
An attempted execution by a User-mode program causes a 
Protection Violation trap to occur. 

If the instruction following a Halt instruction has an exception (e.g., 
TLB Miss), the trap associated with this exception Is taken before the 
processor enters the Halt mode. 
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rNBYTE 



Insert Byte 



INBYTE 



Operation: DEST ^ SRCA, with byte selected by BP 

replaced by low-order byte of SRCB 

Assembler 

Syntax: INBYTE re, ra, rb 
or 
INBYTE re, ra, constS 

Status: Not affected 



Operands: SRCA 
SRCB 

DEBT 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 M 


1 1 1 1 1 1 1 
RC 


II 1 i 1 II 
RA 


1 1 1 1 1 1 1 

RBorl 



OP =0C, OD 



INBYTE 



Description: A byte in the SRCA operand is selected by the Byte Position field of 
the ALU Status Register and the Byte Order bit of the Configuration 
Register. The selected byte is replaced by the low-order byte of the 
SRCB operand, and the resulting word is placed into the DEST 
location. 

Note: The selection of bytes within words is specified in Section 3.4.5. 
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INHW 



Operation: DEST< 



INHW 



Insert Half-Word 

-SRCA, with half-word seleeted by BP replaced by 
low-order half-word of SRCB 



Assembler 
Syntax: 


INHW re, 

or 
INHW re. 


ra. 


rb 




ra. 


eonstS 


Status: 


Not affeeted 




Operands: 


SRCA 




Content of register RA 




SRCB 




M=0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 




DEST 




Register RC 



31 


23 




15 




7 





1 1 1 1 1 1 1 

1 1 1 1 M 


1 II II II 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 78, 79 



INHW 



Description: A half-word In the SRCA operand Is seleeted by the Byte Position 
field of the ALU Status Register and the Byte Order bit of the 
Configuration Register. The seleeted half-word is replaeed by the 
low-order half-word of the SRCB operand, and the resulting word is 
plaeed Into the DEST loeation. 

Note: The seleetion of half-words within words is speeified in 
Seetion 3.4.5. 



8-82 INSTRUCTION SET 



INV 

Operation: 

Assembler 
Syntax: 

Status: 

Operands: 

31 



Invalidate 

Reset all valid bits In Branch Target Cache memory 

INV 

Not affected 

Not applicable 



I I I I I I I 

10 11111 



23 



I I I i I I I 

Reserved 



15 



I I I I I I I 

Reserved 



INV 



I I I I I I I 

Reserved 



OP = 9F 



INV 



Description: This instruction causes all Branch Target Cache memory valid bits to 
be reset, on the execution of the next successful branch. This causes 
all Branch Target Cache memory locations to become invalid. 

This instruction may be executed only by Supervisor-mode programs. 
An attempted execution by a User-mode program causes a 
Protection Violation trap to occur. 
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IRET 



Interrupt Return 



Operation: Perform an interrupt return sequence 

Assembler 

Syntax: IRET 

Status: Not affected 

Operands: Not applicable 



IRET 



31 



23 



15 



II I II II 

10 10 



I I i I I I I 

Reserved 



I I I I I I I 

Reserved 



I I I I I I I 

Reserved 



OP = 88 



IRET 



Description: This instruction performs the interrupt return sequence described in 
Section 3.5.5. 

This Instruction may be executed only by Supervisor-mode programs. 
An attempted execution by a User-mode program causes a 
Protection Violation trap to occur. 
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IRETINV 



IRETINV 



Interrupt Return and Invalidate 



Operation: Perform an interrupt return sequence, and reset all valid bits in 
Branch Target Cache memory 

Assembler 

Syntax: IRETINV 

Status: Not affected 

Operands: Not applicable 



31 



I I I I I I I 

10 110 



23 



I i I I I I I 

Reserved 



15 



II II I II 

Reserved 



I I I I I I I 

Reserved 



OP-8C 



IRETINV 



Description: This instruction performs the interrupt return sequence described in 
Section 3.5.5. When the sequence begins, all Branch Target Cache 
memory valid bits are reset to zeros. This causes all Branch Target 
Cache memory locations to become invalid. 

This instruction may be executed only by Supervisor-mode programs. 
An attempted execution by a User-mode program causes a 
Protection Violation trap to occur. 



INSTRUCTION SET 8-85 



JMP 



JMP 



Jump 



Operation: PC ^ TARGET 

Execute delay instruction 

Assembler 

Syntax: JMP target 

Status: Not affected 

Operands: TARGET A= 0: 117 ... 110// 19 ... 12 (sign-extended to 30 bits) + PC 

A= 1 : 117 ... I10//I9 ... 12 (zero-extended to 30 bits) 



31 



I I I I I I I 

10 1 A 



23 



I i I I I I I 

117... no 



15 



I I I I I I I 

Reserved 



i I I i I I I 

19 ...12 



OP«A0,A1 



JMP 



Description: A non-sequential instruction fetch occurs to the instruction address 
given by the TARGET operand. The instruction following the JMP is 
executed before the non-sequential fetch occurs. 



INSTRUCTION SET 



JMPF 



JMPF 



Jump False 



Operation: 


IF SRCA = FALSE THEN PC 
Execute delay instruction 


Assembler 
Syntax: 


JMPF ra, target 


Status: 


Not affected 



-TARGET 



Operands: SRCA Content of register RA 

TARGET A = 0: 117. ..I10//I9... 12 (sign-extended to 30 bits) + PC 
A = 1 : 11 7 ... 11 0// 19 ... 12 (zero-extended to 30 bits) 



31 



I i II M I 

1 1 1 A 



23 



I I I I I i I 

117... no 



15 



I I I I I I i 

RA 



I I I i i I I 

19 ...12 



OP«A4,A5 



JMPF 



Description: If SRCA is a Boolean FALSE, a non-sequential instruction fetch 
occurs to the Instruction address given by the TARGET operand. 

If SRCA Is a Boolean TRUE, this instruction has no effect. 

The Instruction following the JMPF is executed regardless of the 
value of SRCA. 



INSTRUCTION SET »«7 



JMPFDEC 



Operation: 



JMPFDEC 



Assembler 
Syntax: 

Status: 

Operands: 



Jump False and Decrement 



IF SRCA = FALSE THEN 
SRCA«-^SRCA-1 
PC «- TARGET 

ELSE 

SRCA<-SRCA-1 

Execute delay instruction 



JMPFDEC 
Not affected 
SRCA 
TARGET 



ra, target 

Content of register RA 

A = 0: 117... I10//I9... 12 (sign-extended to 30 blts) + PC 
A= 1 : 117 ... I10//I9 ... 12 (zero-extended to 30 bits) 



31 


23 






15 




7 







II 1 II II 

1 1 1 1 A 


1 


1 1 1 

117. 


1 1 1 

.110 


1 1 1 1 1 1 1 
RA 


1 1 


1 1 

19. 


1 1 1 

.12 



OP-B4, B5 



JMPFDEC 



Description: If SRCA is a Boolean FALSE, a non-sequential instruction fetch 
occurs to the instruction address given by the TARGET operand. 

If SRCA is a Boolean TRUE, this instruction has no effect on the 
instruction-execution sequence. 

The SRCA operand is decremented by one, regardless of whether or 
not the non-sequential instruction fetch occurs. Note that a negative 
number for the SRCA operand Is a Boolean TRUE. 

The Instruction following the JMPFDEC is executed regardless of the 
value of SRCA. 
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JMPFI 



Jump False Indirect 



Operation: IF SRCA= FALSE THEN PC<-SRCB 
Execute delay instruction 

Assembler 

Syntax: JMPFI ra, rb 

Status: Not affected 

Operands: SRCA Content of register RA 

SRCB Content of register RB 



JMPFI 



31 



23 



15 



I II II II 

RA 



I I I I I I I 

RB 



I I I I I I I 

110 10 



t I I I I I 

Reserved 



OP = C4 



JMPFI 



Description: The SRCA is a Boolean FALSE, a non-sequential instruction fetch 
occurs to the Instruction address given by the SRCB operand. 

If SRCA is a Boolean TRUE, this instruction has no effect. 

The instruction following the JMPFI is executed regardless of the 
value of SRCA. 



INSTRUCTION SET 8-89 



JMPI 



Jump Indirect 



Operation: PC <~ SRCB 

Execute delay Instruction 

Assembler 

Syntax: JMPI rb 

Status: Not affected 



Operands: SRCB 

31 23 



Content of register RB 

15 



I I I I I I I 

Reserved 



I I I I I I 

Reserved 



JMPI 



i II I I I I 

RB 



I I i I I I I 

11 



OP = C0 



JMPI 



Description: A non-sequential Instruction fetch occurs to the Instruction address 
given by the SRCB operand. The instruction following the JMPI is 
executed before the non-sequentlal fetch occurs. 



8-90 INSTRUCTION SET 



JMPT 



Operation: 

Assembler 
Syntax: 

Status: 



JMPT 



Jump True 



IF SRCA = TRUE THEN PC< 
Execute delay Instruction 



-TARGET 



JMPT ra, target 
Not affected 



Operands: SRCA 

TARGET 



Content of register RA 

A = 0: 117... I10//I9... 12 (sign-extended to 30 blts) + PC 
A = 1 : 117 ... I10//I9 ... 12 (zero-extended to 30 bits) 



31 



23 



I I I I I I I 

1 1 1 1 A 



I I I I I I I 

117... no 



15 



I I I I I I I 

RA 



I I I I I I 

19... 12 



OP = AC, AD 



JMPT 



Description: If SRCA Is a Boolean TRUE, a non-sequential instruction fetch occurs 
to the instruction address given by the TARGET operand. 

If SRCA Is a Boolean FALSE, this instruction has no effect. 

The instruction following the JMPT Is executed regardless of the 
value of SRCA. 



INSTRUCTION SET 8-S1 



JMPTI 



Operation: 



Jump True Indirect 

IF SRCA = TRUE THEN PC^SRCB 
Execute delay instruction 



Assembler 

Syntax: JMPTI ra, rb 



Status: 
Operands: 

31 



Not affected 

8 RCA Content of register RA 

SRCB Content of register RB 



II I I I I I 

110 110 



23 



I I I I I I I 

Reserved 



15 



JMPTI 



I I I I I I i 

RA 



I I I I I i I 

RB 



OP=CC 



JMPTI 



Description: If the SRCA is a Boolean TRUE, a non-sequential instruction fetch 
occurs to the instruction address given by the SRCB operand. 

If SRCA is a Boolean FALSE, this instruction has no effect. 

The instruction following the JMPTI is executed regardless of the 
value of SRCA. 



8-92 INSTRUCTION SET 



LOAD 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



31 



LOAD 



Load 

DEST<- EXTERNAL WORD [SRCB] 

LOAD ce, cnti, ra, rb 

or 
LOAD ce, cntI, ra, constS 

Not affected 



SRCB 



DEST 



M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RA 



i I I I I I I 

1 1 1 M 



23 



15 



TT 



I I I I I 

RBorl 



I I I I I I 

CNTL 



I I I I I I I 

RA 



OP = 16, 17 



CE 



LOAD 



Description: If the CE bit is 0, tlie external word addressed by the SRCB operand 
is placed into the DEST location. 

If the CE bit is 1 , a word is transferred from the coprocessor into the 
DEST location. The SRCB operand has no pre-defined interpretation 
in this case, though it appears on the address bus. 

The CNTL field of the LOAD instruction affects the access or transfer 
as described in Sections 3.4.4 and 6.1 .2. 
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LOADL 



LOADL 



Load and Lock 



Operation: PEST <- EXT ERNAL WORD [SROB], 
assert LOCK output during access 

Assembler 

Syntax: LOADL ce, cnti, ra, rb 
or 
LOADL ce, cntI, ra, constS 

Status: Not affected 

Operands: SROB M =0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

Register RA 

15 7 



DEST 



31 



I i I I n T" 

1 1 M 



23 



I I I I I I I 
[ CNTL 



i I I i I I 

RA 



I I I I 1 I I 
RBorl 



OP = 06, 07 



CE 



LOADL 



Description: If tlie CE bit is 0, tlie external word addressed by the SRCB operand 
is placed into the DEST location. 

If the CE bit is 1 , a word is transferred from the coprocessor into the 
DEST location. The SRCB operand has no pre-defined interpretation 
in this case, though it appears on the address bus. 

The CNTL field of the LOADL instruction affects the access or 
transfer as described in Sections 3.4.4 and 6.1 .2. 



The LOCK output is asserted during the access or transfer. 
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LOADM 



LOADM 



Load Multiple 



Operation: DEST ... DEST+COUNT<- EXTERNAL WORD [SRCB] ... 
EXTERNAL WORD [SRCB + (COUNT * 4)] 

Assembler 

Syntax: LOADM ce, cnti, ra, rb 
or 
LOADM ce, cntI, ra, constS 

Status: Not affected 



Operands: SRCB 



DEST 



M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RA 



31 



I I I I I I I 

1 1 1 1 M 



23 



15 



i I I I I I 
CNTL 



I I I I I I I 

RA 



I I I I i I I 

RBorl 



OP = 36. 37 



CE 



LOADM 



Description: If the CE bit is 0, external words at consecutive word addresses, 

beginning with the word addressed by the SRCB operand, are placed 
into consecutive registers, beginning with the DEST location. 

If the CE bit is 1 , multiple words are transferred from the coprocessor 
into consecutive registers, beginning with the DEST location. The 
SRCB operand has no pre-defined interpretation in this case. 

The total number of words accessed or transferred in the sequence is 
specified by the Count Remaining (CR) field of the Channel Control 
Register (which also appears In the Load/Store Count Remaining 
Register) at the beginning of the access. The total number of words is 
the value of the CR field plus one. The CNTL field of the LOADM 
instruction affects the access or transfer as described In Sections 
3.4.4 and 6.1.2. 

Note: The address and register-number sequences for the LOADM 
instruction are specified in Section 3.4.4. 



INSTRUCTION SET 8-95 



LOADSET 



LOADSET 



Load and Set 



Operation : 



Assembler 
Syntax: 



Status: 
Operands: 



DEBT <- EXTERNAL WORD [SRCB] 

EXTE RNAL W ORD [SRGB]«-h'FFFFFFFF, 
assert LOCK output during access 

LOADSET ce, cnti, ra, rb 

or 
LOADSET ce, cntI, ra, constS 

Not affected 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RA 



31 



I I I I I I I 

1 1 1 M 



23 



I I I I I I I 
CNTL 



15 



I I I I I I i 

RA 



I I I I I i I 

RBorl 



OP =26, 27 



CE 



LOADSET 



Description: If the CE bit Is 0, the external word addressed by the SRCB operand 
Is placed into the DEST location. After the DEST location Is altered, 
the external word addressed by the SRCB operand is written, 
atomically, with a word consisting of a 1 in every bit position. 

If the CE bit is 1 , a word is transferred from the coprocessor into the 
DEST location. The SRCB operand has no pre-defined interpretation 
in this case, though it appears on the Address Bus. After the DEST 
location is altered, a word consisting of a 1 in every bit position is 
transferred, atomically, to the coprocessor. 

The CNTL field of the LOADSET instruction affects the access or 
transfer as described In Sections 3.4.4 and 6.1 .2. 

The LOCK output is asserted throughout the LOADSET operation. 
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MFACC 

Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



MFACC 



Control: 



Move From Accumulator 

DEST^ACC(ACN) 

MFACC re, FMT, ACN 

fpX,fpU,fpV,fpR 

DEBT Register RC (single-precision f.p.) 

or 
Register RC and twin of Register RC 
(Double-precision f.p.) 

ACC(ACN) Content of ACC(ACN) 

FMT Format of destination operand 

00 Format specified by ACF 

01 Single-precision floating-point 

10 Double-precision floating-point 

1 1 Reserved 

ACN Accumulator number (0,1 ,2, or 3) 



31 


23 




15 


7 







1 1 1 1 1 1 1 

1110 10 1 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

1 


1 i 1 

Res 


fI 

M 

T 


•a 
c 

N 



OP = E9 



MFACC 



Description: The operand in accumulator register ACN is converted to format FMT 
and rounded according to Floating-Point Environment Register field 
FRM, then placed into the DEST location. The format of the operand 
read from accumulator register ACN is specified by Floating-Point 
Environment Register field ACF. 



INSTRUCTION SET 8-97 



MTACC 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



Control: 



MTACC 



Move To Accumulator 



ACC{ACN)^SRCA 

MTACC ra, FMT, ACN 
fpX,fpU,fpV.fpR,fpN 



SRCA 



ACC(ACN) 

FMT 

00 

01 

10 

11 



ACN 



Content of register RA (single-precision f.p.) 

or 
Content of register RA and the twin of 
Register RA (double-precision f.p.) 

Content of ACC(ACN) 

Format of source operand 
Format specified by ACF 
Single-precision floating-point 
Double-precision floating-point 
Reserved 

Accumulator number (0,1, 2, or 3) 



31 


23 


15 




7 







1 1 1 1 1 II 

1110 10 


1 1 1 1 1 1 1 

1 


1 i 1 1 1 1 1 

RA 


1 1 1 

Res 


fI 

M 

T 


Ia 
c 

N 



OP = E8 



Description: The SRCA operand Is converted from format FMT and rounded 
according to Floating-Point Environment Register field FRM, then 
transferred to accumulator register ACC(ACN); the format of the 
destination operand is specified by Floating-Point Environment 
Register field ACF. 

Note that the MTACC instruction uses the fast float mode of 
operation, regardless of the Fast Float Select bit in the Floating-Point 
Environment Register. A denormalized number is flushed to zero 
before being written Into the accumulator. 
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MFSR 



MFSR 



Move from Special Register 



Operation: DEST<- SPECIAL 

Assembler 

Syntax: MFSR re, spid 

Status: Not affected 

Operands: SPECIAL Content of special-purpose register SA 

DEST Register RC 



31 



I I I I i I I 

110 110 



23 



I I I I I I 

RC 



15 



I I I I I 
SA 



II II II I 

Reserved 



OP = C6 



MFSR 



Description: The SPECIAL operand Is placed into the DEST location. 

For programs in the User mode, a Protection Violation trap occurs If 
SA specifies a protected special-purpose register. If a trap occurs, the 
DEST location is not altered. 
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MFTLB 



MFTLB 



Move from Translation Look-Aside Buffer Register 



Operation: DEST^-TLB [SRCA] 

Assembler 

Syntax: MFTLB re, ra 

Status: Not affected 

Operands: SRCA Content of register RA, bits 6 .. 

DEST Register RC 



31 


23 




15 




7 





1 1 1 1 1 1 1 

10 110 110 


II 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

Reserved 



0P = B6 



MFTLB 



Description: The Translation Look-Aside Buffer (TLB) register whose register 
number is specified by the SCRA operand is placed into the DEST 
location. 

This instruction may be executed only by Supervisor-mode programs. 
An attempted execution by a User-mode program causes a 
Protection Violation trap to occur. If a trap occurs, the DEST location 
is not altered. 
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MTSR 



MTSR 



Move to Special Register 



Operation: SPDEST<-SRCB 

Assembler 

Syntax: MTSR spid, rb 

Status: Not affected, unless the destination is the ALU Status Register 

Operands: SRCB Content of register RB 

SPDEST Special-purpose register SA 



31 



I I I I I I I 

110 1110 



23 



15 



I I I I I I I 
RB 



I I I I I I I 

Reserved 



I I I I I I 
SA 



OP = CE 



MTSR 



Description: The SRCB operand is placed into the SPECIAL location. 

For programs in the User mode, a Protection Violation trap occurs if 
SA specifies a protected special-purpose register. If a trap occurs, the 
SPDEST location is not altered. 
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MTSRIM 



MTSRIM 



Move to Special Register Immediate 



Operation: SPDEST<-0I16 

Assembler 

Syntax: MTSRIM spid, const16 

Status: Not affected, unless the destination is the ALU Status Register 

Operands: 0116 115... I8//I7... 10 (zero-extended to 32 bits) 

SPDEST Special-purpose register SA 



31 


23 






15 




7 







1 II II II 

10 


1 1 


1 1 

115. 


1 1 1 

.18 


1 1 1 1 1 1 1 
SA 


1 1 


1 1 

17. 


1 1 1 

.10 



OP = 04 



MTSRIM 



Description: The Oil 6 operand Is placed into the SPECIAL location. 

For programs in the User mode, a Protection Violation trap occurs if 
SA specifies a protected special-purpose register. If a trap occurs, the 
SPDEST location is not altered. 
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MTTLB 



Move to Translation Look-Aside Buffer Register 



Operation: TLB [SRCA]<-SRCB 

Assembler 

Syntax: MTTLB ra, rb 

Status: Not affected 

Operands: SRCA Content of register RA, bits 6...0 

SRCB Content of register RB 



MTTLB 



31 



I II I I I 

10 11111 



23 



I I I I I I I 

Reserved 



15 



Mill 

RA 



TT 



I I I I I I I 

RB 



OP = BE 



MTTLB 



Description: The SRCB operand is placed into the Translation Look-Aside Buffer 
(TLB) register whose register-number is specified by the SRCA 
operand. 

This Instruction may be executed only by Supervisor-mode programs. 
An attempted execution by a User-mode program causes a 
Protection Violation trap to occur. If a trap occurs, the TLB register Is 
not altered. 
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MUL 



Operation: 

Assembler 
Syntax: 



MUL 



Multiply Step 

Perform one-bit step of a multiply operation 

MUL re, ra, rb 

or 
MUL re, ra, eonst 8 



Status: V, N, Z, C 
Operands: SRCA 
SRCB 

DEBT 



Content of register RA 

M=0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 





1 1 1 II II 

1 1 1 M 


1 1 1 1 1 i 1 

RC 


1 1 1 1 i 1 1 
RA 


1 1 1 1 1 i 1 

RBorl 



OP = 64, 65 



MUL 



Description: If the least-slgnifieant bit of the Q Register is 1 , the SRCA operand is 
added to the SRCB operand. If the least-significant bit of the Q 
register is 0, a zero word is added to the SRCB operand. 

The content of the Q Register is appended to the result of the add, 
and the resulting 64-bit value is shifted right by one bit position; the 
true sign of the result of the add fills the vacated bit position (i.e., the 
sign of the result is complemented if an overflow occurred during the 
add operation). The high-order 32 bits of the 64-bit shifted value are 
placed into the DEST location. The low-order 32 bits of the shifted 
value are placed into the Q Register. 

This instruction is provided for compatibility with the Am29000 
microprocessor. 
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MULL 



MULL 



Multiply Last Step 



Operation: Complete a sequence of multiply steps (for signed multiply) 

Assembler 

Syntax: MULL re, ra, rb 



or 





MULL re. ra, 


eonst 8 








Status: 


V. N, Z, C 






Operands: 


SRCA 


Content of register RA 






SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEBT 


Register RC 




31 


23 


15 7 





Mill 

110 


1 1 

1 1 M 


1 1 1 1 1 1 i 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 i 

RBorl 



OP « 66, 67 



MULL 



Description: If the least-significant bit of the Q Register is 1 , the SRCA operand is 
subtracted from the SRCB operand. If the least-significant bit of the Q 
register is 0, a zero word is subtracted from the SRCB operand. 

The content of the Q Register is appended to the result of the 
subtract, and the resulting 64-bit value is shifted right by one bit 
position; the true sign of the result of the subtract fills the vacated bit 
position (i.e., the sign of the result is complemented if an overflow 
occurred during the subtract operation). The high-order 32 bits of the 
64-bit shifted value are placed into the DEST location. The low-order 
32 bits of the shifted value are placed into the Q Register. 

This Instruction is provided for compatibility with the Am29000 
microprocessor. 



INSTRUCTION SET 8-105 



MULTIPLU 



MULTIPLU 



Integer Multiply, Unsigned 



Operation: DEST4-SRCA*SRCB 

Assembler 

Syntax: MULTIPLU re, ra, rb 

Status: None 

Operands: SRCA Content of register RA 

SRCB Content of register RB 

DEBT Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1110 10 


i 1 1 1 1 i 1 

RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 
RB 



OP = E2 



MULTIPLU 



Description: The SRCA operand is multiplied by the SRCB operand. The 
low-order 32 bits of the 64-bit result are placed into the DEST 
location. This operation treats the SRCA and SRCB operands as 
unsigned integers and produces an unsigned result 

The contents of the Q register are undefined after a MULTIPLU 
operation. 



8-106 INSTRUCTION SET 



MULTIPLY 



MULTIPLY 



Integer Multiply, Signed 



Operation: DEST <- SRCA * SRCB 

Assembler 

Syntax: MULTIPLY re, ra, rb 

Status: None 

Operands: SRCA Content of register RA 

SRCB Content of register RB 

DEST Register RC 



31 


23 




15 




7 







1 II 1 1 II 

1110 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 
RB 



OP-EO 



MULTIPLY 



Description: The SRCA operand is multiplied by the SRCB operand. The 
low-order 32 bits of the 64-bit result are placed into the DEST 
location. This operation treats the SRCA and SRCB operands as 
two's-complement integers and produces a two's-complement result. 

The contents of the Q register are undefined after a MULTIPLY 
operation. 



INSTRUCTION SET 8-107 



MULTM 



MULTM 



Integer Multiply Most-Significant Bits, Signed 



Operation: DEST4-SRCA*SRCB 

Assembler 

Syntax: MULTM re, ra, rb 

Status: None 

Operands: SRCA Content of register RA 

SRCB Content of register RB 

DEBT Register RC 



31 


23 




15 




7 







1 1 1 i 1 1 i 

110 11110 


1 II 1 II 1 
RC 


i 1 1 1 1 1 1 

RA 


1 i i 1 1 i 1 

RB 



OP = DE 



MULTM 



Description: Tlie SRCA operand is multiplied by the SRCB operand. The 
high-order 32 bits of the 64-bit result are placed Into the DEST 
location. This operation treats the SRCA and SRCB operands as 
two's-complement Integers and produces a two's-complement result. 

The contents of the Q register are undefined after a MULTM 
operation. 



8-108 INSTRUCTION SET 



MULTMU 

Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



MULTMU 



Integer Multiply Most-Significant Bits, Unsigned 

DEST^SRCA*SRCB 

MULTMU re. ra, rb 

None 

SRCA Content of register RA 

SRCB Content of register RB 

DEBT Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

110 11111 


1 II 1 1 II 

RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 II 

RB 



OP = DF 



MULTMU 



Description: The SRCA operand is multiplied by the SRCB operand. The 
high-order 32 bits of the 64-blt result are placed into the DEST 
location. This operation treats the SRCA and SRCB operands as 
unsigned integers and produces an unsigned result. 

The contents of the Q register are undefined after a MULTMU 
operation. 



INSTRUCTION SET 8-109 



MULU 



MULU 



Multiply Step, Unsigned 



Operation: Perform one-bit step of a multiply operation (unsigned) 

Assembler 

Syntax: MULU re, ra, rb 
or 
MULU re, ra, eonstS 

Status: V, N, Z, C 

Operands: SRCA 

SRCB 



DEBT 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 





1 II II II 

1 1 1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP =74, 75 



MULU 



Description: If the least-signifleant bit of the Q Register is 1 , the SRCA operand is 
added to the SRCB operand. If the least-slgnifleant bit of the Q 
register is 0, a zero word is added to the SRCB operand. 

The eontent of the Q register is appended to the result of the add, and 
the resulting 64-bit value is shifted right by one bit position; the 
earry-out of the add fills the vacated bit position. The high-order 32 
bits of the 64-bit shifted value are plaeed into the DEST location. The 
low-order 32 bits of the shifted value are placed into the Q Register. 

This instruction is provided for compatibility with the Am29000 
microprocessor. 



8-110 INSTRUCTION SET 



NAND 



NAND Logical 



Operation: DEST^ -(SRCA & SRCB) 

Assembler 

Syntax: NAND re, ra, rb 
or 
NAND re, ra, eonstS 



Status: N, Z 
Operands: SRCA 
SRCB 

DEST 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



NAND 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 1 M 


1 II II II 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



0P = 9A. 9B 



NAND 



Description: The SRCA operand is logieally ANDed, bit-by-bit, with the SRCB 

operand. The one's-eomplement of the result is plaeed into the DEST 
loeation. 



INSTRUCTION SET 8-111 



NOR 



NOR Logical 



NOR 



Operation: DEST <- -(SRCA | SRCB) 

Assembler 

Syntax: NOR re, ra, rb 



or 



NOR 


rc, 


ra, 


constS 










Status: N, Z 










Operands: SRCA 




Content of register RA 






SRCB 




M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEST 




Register RC 






31 23 




15 7 







II 1 1 1 1 1 

1 1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 i 1 1 1 1 

RBori 


OP =98, 99 






NOR 









Description: Tiie SRCA operand is logicaliy ORed, bit-by-bit, with tiie SRCB 

operand. Tlie one's-complement of the result is placed into the DEST 
location. 



8-112 INSTRUCTION SET 



OR 



OR 



OR Logical 



Operation: DEST <- SRCA | SRCB 

Assembler 

Syntax: OR re, ra, rb 
or 
OR re, ra, eonstS 

N.Z 



Status 

Operands: SRCA 

SRCB 

DEST 



Content of register RA 

M=0: Content of register RB 
IVI = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 







II II i II 

10 1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 i 1 i 1 1 

RBorl 



OP = 92. 93 



OR 



Description: Tlie SRCA operand is logieally ORed, bit-by-bit, with tlie SRCB 
operand, and tlie result is plaeed into tlie DEST location. 



INSTRUCTION SET 8-113 



ORN 



OR-NOT Logical 



Operation: DEST <~ SRCA | - SRCB 

Assembler 

Syntax: ORN re, ra, rb 

or 
ORN re, ra, constS 

Status: N, Z 

Operands: SRCA 

SRCB 



DEST 



Content of register RA 

M=0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



ORN 



31 



I I I I I II 

1 1 1 1 M 



23 



I I I I I I I 
RC 



15 



I I I I I I I 

RA 



I I I I I I I 

RBori 



OP=AA.AB 



Description: Tlie SRCA operand is logieally ORed, bit-by-bit, witii the 

one's-eomplement of the SRCB operand, and the result is plaeed into 
the DEST location. 



8-114 INSTRUCTION SET 



SETIP 



SETIP 



Set Indirect Pointers 



Operation: Load IPA, IPB, and IPC registers with operand-register numbers 

Assembler 

Syntax: SETIP re, ra, rb 

Status: Not affected 

Operands: Absolute-register numbers for registers RA, RB, and RC 



31 


23 




15 




7 







1 II 1 1 II 

10 11110 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 
RB 



OP-9E 



SETIP 



Description: The IPA, IPB, and IPC registers are set to the register numbers of 
registers RA, RB, and RC, respectively. 

For programs in the User mode, a Protection Violation trap occurs If 
RA, RB, or RC specifies a register that is protected by the Register 
Bank Protect Register. 



INSTRUCTION SET 8-115 



SLL 



SLL 



Shift Left Logical 



Operation: DEST <- SRCA « SRCB (zero fill) 

Assembler 

Syntax: SLL re, ra, rb 



or 





SLL re, 


ra, eonstS 


Status: 


Not affeeted 




Operands: 


SRCA 




Content of register RA 




SRCB 




M =0: Content of register RB, bits 4 ... 
M = 1:l,bits4...0 




DEST 




Register RC 



31 


23 




15 




7 







II M 1 II 

1 M 


1 1 1 1 1 1 1 
RC 


1 1 1 i 1 1 i 

RA 


1 1 1 1 1 1 i 

RBorl 



OP = 80, 81 



SLL 



Description: The SRCA operand is sliifted left by the number of bit positions 

specified by the SRCB operand; zeros fill vacated bit positions. The 
result is placed into the DEST location. 



8-116 INSTRUCTION SET 



SORT 



Operation: 

Assembler 
Syntax: 

Status: 

Operands: 



SORT 



Control: 



Floating-Point Square Root 



DESTf-.SQRT(SRCA) 

SQRT re, ra, FS 

fpX. fpR. fpN 

SRCA Content of register RA (single-precision f.p.) 
or 
Content of register RA and the twin of register RA 
(double-precision f.p.) 

DEST Register RC (single-precision f.p.) 
or 
Register RC and twin of Register RC (double-precision f.p.) 

FS Format of source operand SRCA 

00 Resen/ed for future use 

01 Single-precision floating-point 

10 Double-precision floating-point 

1 1 Resen/ed for future use 



31 



I I I I I I I 

1110 10 1 



23 



I I I II II 

RC 



15 



I I I I I 
RA 



I I I I I 

Reserved 



I 

FS 



OP = E5 



SQRT 



Description: This operation computes the square root of floating-point operand 
SRCA; the result is rounded according to FRM field of the 
Floating-Point Environment Register and placed Into the DEST 
location. The operand and result are single- or double-precision 
floating-point numbers, as specified by FS. 



INSTRUCTION SET 8-117 



SRA 



SRA 



Shift Right Arithmetic 



Operation: DEST«-SRCA»SRCB (sign fill) 

Assembler 

Syntax: SRA re, ra, rb 

or 
SRA re, ra, eonstS 

Status: Not affeeted 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB, bits 4 ... 

M = 1:l,bits4...0 

DEBT Register RC 



31 


23 




15 




7 





II II II 1 

10 1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP = 86, 87 



SRA 



Description: The SRCA operand is shifted right by the number of bit positions 
speeified by the SRCB operand; the sign of the SRCA operand fills 
vaeated bit positions. The result is plaeed into the DEST loeation. 



8-118 INSTRUCTION SET 



SRL 



Shift Right Logical 



Operation: DEST<-SRCA»SRCB (zero fill) 

Assembler 

Syntax: SRL re, ra, rb 
or 
SRL re, ra, eonstS 

Status: Not affeeted 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB, bits 4 ... 



DEST 



M = 1:l, bits4...0 
Register RC 



SRL 



31 


23 




15 




7 







II II II 1 

1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBori 



OP = 82, 83 



SRL 



Description: The SRCA operand is shifted right by the number of bit positions 

speeified by the SRCB operand; zeros fill vaeated bit positions. The 
result is plaeed into the DEST loeation. 



INSTRUCTION SET 8-119 



STORE 



Store 



Operation: EXTERNAL WORD [SRCB] 4- SRCA 

Assembler 

Syntax: STORE ce, cnti, ra, rb 
or 
STORE ce, cntI, ra, const8 

Status: Not affected 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

l\/l = 1 : 1 (Zero-extended to 32 bits) 



STORE 



31 



I I II I I I 

1 1 1 1 M 



23 



I I I I I I 

CNTL 



15 



I I I I I I I 

RA 



I I I I I I I 

RB or I 



OP=1E, 1F 



STORE 



CE 



Description: If the CE bit Is 0, tiie SRCA operand is placed into the external word 
addressed by the SRCB operand. 

If the CE bit is 1 , the SRCA and SRCB operands are transferred to 
the coprocessor. 

The CNTL field of the STORE instruction affects the access or 
transfer as described in Sections 3.4.4 and 6.1.2. 



8-120 INSTRUCTION SET 



STOREL 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



STOREL 



Store and Lock 

EXTE RNAL W ORD [SRCBJ^SRCA, 
assert LOCK output during access 

STOREL ce,cntl,ra,rb 

or 
STOREL ce, cnti, ra, constS 

Not affected 

SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 



31 



I I I I I 

1 



23 



15 



I I 

1 1 M 



I I I I I I 

CNTL 



I I I I I i i 
RA 



I I I I I I I 

RBorl 



OP = 0E,0F 



CE 



STOREL 



Description: If tlie CE bit is 0, the SRCA operand is placed into the external word 
addressed by the SRCB operand. 

If the CE bit is 1 , the SRCA and SRCB operands are transferred to 
the coprocessor. 

The CNTL field of the STOREL instruction affects the access or 
transfer as described in Sections 3.4.4 and 6.1 .2. 



The LOCK output is asserted during the access or transfer. 



INSTRUCTION SET 8-121 



STOREM 



Operation: 



STOREM 



Store Multiple 

EXTERNAL WORD [SRCB] ... EXTERNAL WORD 

[SRCB + (G0UNT*4)] 

<r- SRCA ... SRCA+COUNT 



Assembler 
Syntax: 


ST0REI\4 ce, 


cnti, ra, rb 








or 
STOREM ce. 


cnti, ra, constS 


Status: 


Not affected 




Operands: 


SRCA 


Content of register RA 




SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 


31 


23 


15 7 C 


1 1 1 1 1 

111 


1 1 

1 1 M 




1 1 1 1 1 1 

CNTL 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 
RBorl 


OP=3E.3F 1 
CE 


STOREiVI 


Description: 


If the 


CE 


: bit is 0, the contents of consecutive registers, beginning wi 



the SRCA operand, are placed Into external words at consecutive 
word addresses, beginning with the word addressed by the SRCB 
operand. 

If the CE bit is 1 , the contents of consecutive registers, beginning with 
the SRCA operand, are transferred to the coprocessor. The SRCB 
operand has no pre-defined interpretation in this case. 

The total number of words accessed or transferred in the sequence Is 
specified by the Count Remaining (CR) field of the Channel Control 
Register (which also appears In the Load/Store Count Remaining 
Register) at the beginning of the access. The total number of words Is 
the value of the CR field plus one. The CNTL field of the STOREM 
Instruction affects the access or transfer as described In Sections 
3.4.4 and 6.1.2. 

Note: The address and register-number sequences for the STOREM 
instruction are specified In Section 3.4.4. 



8-1 22 INSTRUCTION SET 



SUB 



SUB 



Subtract 



Operation: DEST^ SRCA - SRCB 

Assembler 

Syntax: SUB re, ra, rb 
or 
SUB re, ra, eonstS 

Status: V, N. Z, C 

Operands: 8 RCA Content of register RA 

SRCB M =0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEBT Register RC 



31 


23 




15 




7 







1 II II 1 1 

1 1 M 


1 1 1 1 1 1 1 

RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP =24, 25 



SUB 



Description: The SRCA operand is added to the two's-eomplement of the SRCB 
operand, and the result is plaeed into the DEST location. 



INSTRUCTION SET 8-123 



SUBC 



Subtract with Carry 



Operation: DEST<-SRCA-SRCB-1 +C 

Assembler 

Syntax: SUBC re, ra, rb 
or 
SUBC re, ra, eonstS 

Status: V, N, Z. C 

Operands: SRCA 

SRCB 



DEST 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



SUBC 



31 


23 




15 




7 





1 1 1 1 1 i 1 

1 1 1 M 


1 1 1 1 1 1 1 
RC 


i 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP=2C, 2D 



SUBC 



Description: Tlie SRCA operand is added to tiie one's-eomplement of tfie SRCB 
operand and the value of tiie ALU Status Carry bit, and tlie result Is 
plaeed into the DEST location. 



8-1 24 INSTRUCTION SET 



SUBCS 



SUBCS 



Subtract with Carry, Signed 



Operation: DEST4-SRCA-SRCB-I +C 

IF signed overflow THEN Trap (Out of Range) 



Assembler 
Syntax: 


SUBCS re. 
or 


ra, rb 








SUBCS re, 


ra, eonstS 


Status: 


V, N.Z,C 




Operands: 


SRCA 


Content of register RA 




SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 




DEST 


Register RC 


31 


23 


15 7 


1 1 1 1 

10 1 


1 1 

M 


1 1 1 1 1 1 1 

RC 


i 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 


OP = 28, 


29 


SUBCS 


Description: 


The SRCA operand is added to the one's-eomplement of the SRCB 



operand and the value of the ALU Status Carry bit, and the result Is 
plaeed into the DEST location. If the add operation causes a 
two's-eomplement signed overflow, an Out of Range trap occurs. 

Note that the DEST location is altered whether or not an overflow 
oecurs. 



INSTRUCTION SET 8-125 



SUBCU 



SUBCU 



Subtract with Carry, Unsigned 



Operation: DEST<-SRCA-SRCB-1 +C 

IF unsigned underflow THEN Trap (Out of Range) 

Assembler 

Syntax: SUBCU re, ra, rb 
or 
SUBCU re, ra, eonstS 

V, N, Z, C 



Status: 

Operands: SRCA 

SRCB 

DEBT 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 







1 II II II 

1 1 1 M 


1 1 1 1 i 1 1 

RC 


1 1 1 i 1 1 1 

RA 


1 1 1 1 1 1 1 

RB or 1 



OP=2A, 2B 



SUBCU 



Description: The SRCA operand is added to the one's-eomplement of the SRCB 
operand and the value of the ALU Status Carry bit, and the result Is 
plaeed into the DEST loeation. If the add operation eauses an 
unsigned underflow, an Out of Range trap oeeurs. 

Note that the DEST loeation Is altered whether or not an underflow 
oeeurs. 



8-126 INSTRUCTION SET 



SUBR 



Subtract Reverse 



Operation: DEST<-SRCB-SRCA 

Assembler 

Syntax: SUBR re, ra, rb 
or 
SUBR re, ra, eonstS 

V,N.Z.C 



Status 

Operands: SRCA 

SRCB 

DEST 



31 



II II I II 

1 1 1 M 



23 



Content of register RA 

M =0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 

15 7 



SUBR 



I I I I I I I 

RC 



I I I I I I I 

RA 



I I I I I I I 

RBorl 



OP =34, 35 



SUBR 



Description: The SRCB operand is added to the two's-eomplement of the SRCA 
operand and the result is placed into the DEST loeation. 



INSTRUCTION SET 8-127 



SUBRC 



Subtract Reverse with Carry 



Operation: DEST^SRCB-SRCA-1 +C 

Assembler 

Syntax: SUBRC re, ra, rb 
or 
SUBRC re, ra, eonstS 

Status: V, N, Z, C 

Operands: SRCA 

SRCB 



DEST 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



SUBRC 



31 


23 




15 




7 







1 1 1 II II 

1 1 1 1 M 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP=3C,3D 



SUBRC 



Description: Tlie SRCB operand is added to the one's-eomplement of tiie SRCA 
operand and the value of the ALU Status Carry bit, and the result is 
plaeed into the DEST loeation. 



8-128 INSTRUCTION SET 



SUBRCS 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



SUBRCS 



Subtract Reverse with Carry, Signed 

DEST<-SRCB-SRCA-1 +C 

IF signed overflow THEN Trap (Out of Range) 

SUBRCS re, ra, rb 

or 
SUBRCS re, ra, constS 

V, N. 2, C 

SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 1 M 


II II II 1 

RC 


1 1 1 1 1 1 1 
RA 


1 1 1 1 1 1 1 

RBorl 



OP- 38, 39 



SUBRCS 



Description: The SRCB operand is added to the one's-complement of the SRCA 
operand and the value of the ALU Status Carry bit, and the result is 
plaeed into the DEST loeation. If the add operation eauses a 
two's-complement signed overflow, an Out of Range trap occurs. 

Note that the DEST location is altered whether or not an overflow 
occurs. 



INSTRUCTION SET 8-129 



SUBRCU 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



SUBRCU 



Subtract Reverse with Carry, Unsigned 

DEST<-SRCB-SRCA-1 +0 

IF unsigned underflow THEN Trap (Out of Range) 

SUBRCU re, ra, rb 

or 
SUBRCU re, ra, eonstS 

V, N, Z, C 

SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



31 


23 




15 




7 







1 i 1 1 i II 

1 1 1 1 M 


i 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 

RA 


1 i 1 1 1 1 1 
RBorl 



OP = 3A,3B 



SUBRCU 



Description: The SRCB operand is added to the one's-eomplement of the SRCA 
operand and the value of the ALU Status Carry bit, and the result is 
placed into the DEST loeation. If the add operation causes an 
unsigned underflow, an Out of Range trap occurs. 

Note that the DEST location is altered whether or not an underflow 
occurs. 
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SUBRS 



SUBRS 



Subtract Reverse, Signed 



Operation: DEST<-SRCB-SRCA 

IF signed overflow THEN Trap (Out of Range) 

Assembler 

Syntax: SUBRS re, ra, rb 
or 
SUBRS re, ra, eonstS 

Status: V, N, Z. C 

Operands: SRCA Content of register RA 

SRCB M = 0: Content of register RB 

M = 1 : 1 (Zero-extended to 32 bits) 

DEST Register RC 



31 


23 




15 




7 





II i 1 1 II 

1 1 M 


1 1 1 1 1 1 1 
RC 


1 1 1 1 1 1 1 
RA 


1 i 1 i 1 1 i 

RBorl 



OP = 30, 31 



SUBRS 



Description: The SRCB operand Is added to the two's-eomplement of the SRCA 
operand, and the result is plaeed into the DEST ioeation. If the add 
operation eauses a two's-eomplement signed overflow, an Out of 
Range trap oeeurs. 

Note that the DEST Ioeation is altered whether or not an overflow 
oeeurs. 
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SUBRU 



Subtract Reverse, Unsigned 



SUBRU 



Operation: 


DEST<-SRCB-SRCA 

IF unsigned underflow THEN Trap (Out of Range) 




Assembler 
Syntax: 


SUBRU re, 
or 


ra, rb 






SUBRU re 


ra, eonstS 




Status: 


V, N, Z, C 






Operands: 


SRCA 


Content of register RA 






SRCB 


M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 






DEST 


Register RC 




31 


23 


15 7 





Mill 

110 


1 1 

1 M 


1 1 1 1 1 1 1 
RC 


1 i 1 1 1 1 1 
RA 


1 1 1 1 1 i 1 

RBorl 


OP = 32 


33 




SUB 


RU 







Description: The SRCB operand Is added to the two's-eomplement of the SRCA 
operand, and the result Is plaeed Into the DEST loeatlon. If the add 
operation eauses an unsigned underflow, an Out of Range trap 
oeeurs. 

Note that the DEST loeatlon Is altered whether or not an underflow 
oeeurs. 
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SUBS 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



SUBS 



Subtract, Signed 

DEST<-SRCA-SRCB 

IF signed overflow THEN Trap (Out of Range) 

SUBS re, ra, rb 

or 
SUBS re, ra, eonstS 



V, N, Z, C 

SRCA 

SRCB 

DEST 



Content of register RA 

M=0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 







1 1 1 1 1 II 

1 M 


II 1 1 1 1 1 
RC 


M 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 
RBorl 



OP = 20, 21 SUBS 

Description: The SRCA operand is added to the two's-eomplement of the SRCB 
operand, and the result Is plaeed into the DEST loeation. If the add 
operation eauses a two's-eomplement signed overflow, an Out of 
Range trap oeeurs. 

Note that the DEST loeation is altered whether or not an overflow 
oeeurs. 
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SUBU 



Operation: 

Assembler 
Syntax: 



Status: 
Operands: 



SUBU 



Subtract, Unsigned 

DEST^SRCA-SRCB 

IF unsigned underflow THEN Trap (Out of Range) 

SUBU re, ra, rb 

or 
SUBU re, ra, constS 

V.N,Z,C 

SRCA 

SRCB 



DEST 



Content of register RA 

M = 0: Content of register RB 
M= 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 







1 1 1 1 1 1 1 

1 1 M 


1 1 1 1 1 II 
RC 


1 1 1 1 1 1 1 

RA 


1 11 1 1 1 1 

RBori 



OP = 22, 

Description: 



23 



SUBU 



The SRCA operand is added to the two's-eomplement of the SRCB 
operand, and the result is plaeed into the DEST loeatlon. If the add 
operation causes an unsigned underflow, an Out of Range trap 
occurs. 

Note that the DEST loeatlon is altered whether or not an underflow 
occurs. 
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XNOR 



Exclusive-NOR Logical 



Operation: DEST <r- - (SRCA '^ SRCB) 

Assembler 

Syntax: XNOR re, ra, rb 
or 
XNOR re, ra, eonstS 

Status: N, Z 

Operands: SRCA 

SRCB 



DEST 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



XNOR 



31 


23 




15 




7 







1 II 1 1 II 

1 1 1 1 M 


1 1 1 1 1 1 1 
RC 


1 II 1 1 1 1 

RA 


1 1 1 i 1 1 1 

RBorl 



OP = 96. 97 



XNOR 



Description: The SRCA operand is logieally exelusive-ORed, bit-by-bit, witli the 

SRCB operand. The one's-eomplement of the result is plaeed into the 
DEST loeation. 



INSTRUCTION SET 8-135 



XOR 



XOR 



Exclusive-OR Logical 



Operation: DEST «- SRCA '^ SRCB 

Assembler 

Syntax: XOR re, ra, rb 
or 
XOR re, ra, eonstS 

N,Z 



Status 

Operands: SRCA 

SRCB 

DEST 



Content of register RA 

M = 0: Content of register RB 
M = 1 : 1 (Zero-extended to 32 bits) 

Register RC 



31 


23 




15 




7 





1 1 1 1 1 1 1 

1 1 1 M 


1 II II II 
RC 


1 1 1 1 1 1 1 

RA 


1 1 1 1 1 1 1 

RBorl 



OP =94, 95 



XOR 



Description: The SRCA operand is logieally exelusive-ORed, bit-by-bit, witli tiie 
SRCB operand, and tiie result is plaeed into tlie DEST loeation. 
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8.5 



INSTRUCTION INDEX BY OPERATION CODE 


01 


CONSTN 


Constant, Negative 


02 


CONSTH 


Constant, High 


03 


CONST 


Constant 


04 


MTSRIM 


IVIove to Special Register Immediate 


05 


CONSTHZ 


Constant High, Zero Lower 


06,07 


LOADL 


Load and Lock 


08,09 


CLZ 


Count Leading Zeros 


OA,OB 


EXBYIt 


Extract Byte 


OCOD 


INBYTE 


Insert Byte 


0E,OF 


STOREL 


Store and Lock 


10,11 


ADDS 


Add, Signed 


12,13 


ADDU 


Add, Unsigned 


14,15 


ADD 


Add 


16,17 


LOAD 


Load 


18,19 


ADDCS 


Add with Carry, Signed 


1A,1B 


ADDCU 


Add with Carry, Unsigned 


1C,1D 


ADDC 


Add with Carry 


1E,1F 


STORE 


Store 


20,21 


SUBS 


Subtract, Signed 


22,23 


SUBU 


Subtract, Unsigned 


24,25 


SUB 


Subtract 


26,27 


LOADSET 


Load and Set 


28,29 


SUBCS 


Subtract with Carry, Signed 


2A,2B 


SUBCU 


Subtract with Carry, Unsigned 


2C,2D 


SUBC 


Subtract with Carry 


2E,2F 


CPBY 1 t 


Compare Bytes 


30,31 


SUBRS 


Subtract Reverse, Signed 


32.33 


SUBRU 


Subtract Reverse, Unsigned 


34,35 


SUBR 


Subtract Reverse 


36,37 


LOADM 


Load Multiple 


38,39 


SUBRCS 


Subtract Reverse with Carry, Signed 


3A,3B 


SUBRCU 


Subtract Reverse with Carry, Unsigned 


3C,3D 


SUBRC 


Subtract Reverse with Carry 


3E,3F 


STOREM 


Store Multiple 


40,41 


CPLT 


Compare Less Than 


42,43 


CPLTU 


Compare Less Than, Unsigned 


44,45 


CPLE 


Compare Less Than or Equal To 


46,47 


CPLEU 


Compare Less Than or Equal To, Unsigned 


48,49 


CPGT 


Compare Greater Than 


4A,4B 


CPGTU 


Compare Greater Than, Unsigned 


4C,4D 


CPGE 


Compare Greater Than or Equal To 


4E,4F 


CPGEU 


Compare Greater Than or Equal To, Unsigned 


50,51 


ASLT 


Assert Less Than 


52,53 


ASLTU 


Assert Less Than, Unsigned 


54,55 


ASLE 


Assert Less Than or Equal To 


56,57 


ASLEU 


Assert Less Than or Equal To, Unsigned 
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58,59 


ASGT 


Assert Greater Than 


5A,5B 


ASGTU 


Assert Greater Than, Unsigned 


5C,5D 


ASGE 


Assert Greater Than or Equal To 


5E.5F 


ASGEU 


Assert Greater Than or Equal To, Unsigned 


60,61 


CPEQ 


Compare Equal To 


62,63 


CPNEQ 


Compare Not Equal To 


64,65 


MUL 


Multiply Step 


66,67 


MULL 


Multiply Last Step 


68,69 


DIVO 


Divide Initialize 


6A,6B 


DIV 


Divide Step 


6C,6D 


DIVL 


Divide Last Step 


6E,6F 


DIVREM 


Divide Remainder 


70,71 


ASEQ 


Assert Equal To 


72,73 


ASNEQ 


Assert Not Equal To 


74,75 


MULU 


Multiply Step, Unsigned 


78,79 


INHW 


Insert Half-Word 


7A,7B 


EXTRACT 


Extract Word. Bit-Aligned 


7C,7D 


EXHW 


Extract Half-Word 


7E 


EXHWS 


Extract Half-Word, Sign-Extended 


80,81 


SLL 


Shift Left Logical 


82.83 


SRL 


Shift Right Logical 


86,87 


SRA 


Shift Right Arithmetic 


88 


IRET 


Interrupt Return 


89 


HALT 


Enter HALT Mode 


8C 


IRETINV 


Interrupt Return and Invalidate 


90,91 


AND 


AND Logical 


92,93 


OR 


OR Logical 


94,95 


XOR 


Exclusive-OR Logical 


96.97 


XNOR 


Exclusive-NOR Logical 


98,99 


NOR 


NOR Logical 


9A.9B 


NAND 


NAND Logical 


9C,9D 


ANDN 


AND-NOT Logical 


9E 


SETIP 


Set Indirect Pointers 


9F 


INV 


Invalidate 


AO.AI 


JMP 


Jump 


A4.A5 


JMPF 


Jump False 


A8.A9 


CALL 


Call Subroutine 


AA.AB 


ORN 


OR-NOT Logical 


ACAD 


JMPT 


Jump True 


B4,B5 


JMPFDEC 


Jump False and Decrement 


B6 


MFTLB 


Move from Translation Look-Aside Buffer Register 


BE 


MTTLB 


Move to Translation Look-Aside Buffer Register 


BF 


Reserved for emulation (trap vector number 28) 


CO 


JMPI 


Jump Indirect 


C4 


JMPFI 


Jump False Indirect 


C6 


MFSR 


Move from Special Register 


C8 


CALLI 


Call Subroutine, Indirect 
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cc 


JMPTI 


CE 


MTSR 


CF-D6 


Reserved for 


D7 


EMULATE 


D8 


FMAC 


D9 


DMAC 


DA 


FMSM 


DB 


DMSM 


DC-DD 


Reserved for 


DE 


MULTM 


DF 


MULTMU 


EO 


MULTIPLY 


E1 


DIVIDE 


E2 


MULTIPLU 


E3 


DIVIDU 


E4 


CONVERT 


E5 


SQRT 


E6 


CLASS 


E7 


Reserved for 


E8 


MTACC 


E9 


MFACC 


EA 


FEQ 


EB 


DEQ 


EC 


FGT 


ED 


DOT 


EE 


FGE 


EF 


DGE 


FO 


FADD 


F1 


DADD 


F2 


FSUB 


F3 


DSUB 


F4 


FMUL 


F5 


DMUL 


F6 


FDIV 


F7 


DDIV 


F8 


Reserved for 


F9 


FDMUL 


FA-FF 


Reserved for 



Jump True Indirect 
Move to Special Register 

emulation (trap vector number 28) 

Trap to Software Emulation Routine 

Floating-Point Multiply-Accumulate, 

Single-Precision 

Floating-Point Multiply-Accumulate, 

Double-Precision 

Floating-Point Multiply-Sum, Single-Precision 

Floating-Point Multiply-Sum, Double-Precision 

emulation (trap vector numbers 28-29) 

Integer Multiply Most-Significant Bits, Signed 

Integer Multiply Most-Significant Bits, Unsigned 

Integer Multiply, Signed 

Integer Divide, Signed 

Integer Multiply, Unsigned 

Integer Divide, Unsigned 

Convert Data Format 

Square Root 

Classify Floating-Point Operand 

emulation (trap vector number 39) 
Move to Accumulator 
Move from Accumulator 
Floating-Point Equal To, Single-Precision 
Floating-Point Equal To, Double-Precision 
Floating-Point Greater Than, Single-Precision 
Floating-Point Greater Than, Double-Precision 
Floating-Point Greater Than or Equal To, 
Single-Precision 

Floating-Point Greater Than or Equal To, 
Double-Precision 

Floating-Point Add, Single-Precision 
Floating-Point Add, Double-Precision 
Floating-Point Subtract, Single-Precision 
Floating-Point Subtract, Double-Precision 
Floating-Point Multiply, Single-Precision 
Floating-Point Multiply, Double-Precision 
Floating-Point Divide, Single-Precision 
Floating-Point Divide, Double-Precision 

emulation (trap vector number 56) 

Floating-Point Multiply, Single-to-Double-Precision 

emulation (trap vector numbers 58-63) 
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APPEiyDIX A 



CHANNEL OPERATION TIMING 



n 



Table A-1 


Signal Summary 










Signal Name 


Signal Function 


Type(i) 


Synch 
Async 




A(31-0) 


Address Bus 


3-State Output 


Synch 




BGRT 


Bus Grant 


Output 


Synch 




BINV 


Bus Invalid 


Output 


Synch 




BREQ 


Bus Request 


Input 


Synch 




CDA 


Coprocessor Data Accept 


Input 


Synch 




CNTL(1-0) 


CPU Control 


Input 


Async 




D(31-0) 


Data Bus 

Data Burst Acknowledge 


Bi-directional 
Input 


Synch 




DBACK 


Synch 




DBREQ 


Data Burst Request 


3-State Output 


Synch 




DERR 


Data Error 


Input 


Synch 




DRDY 


Data Ready 
Data Request 


Input 

3-State Output 


Synch 




DREQ 


Synch 




DREQT(I-O) 


Data Request Type 


3-State Output 


Synch 




1(31-0) 


Instruction Bus 

Instruction Burst Acknowledge 

Instruction Burst Request 


Input 
Input 
3-State Output 


Synch 




IBACK 


Synch 




IBREQ 


Synch 




lERR 


Instruction Error 


Input 


Synch 




INCLK 


Input Clock 


Input 


N/A 




InTR(3-0) 


Interrupt Request 


Input 


Async 




IRDY 


Instruction Ready 


Input 


Synch 




IREQ 


Instruction Request 


3-State Output 


Synch 




IREQT 


Instruction Request Type 


3-State Output 


Synch 



(1 ) The signals labeled "3-state output" and "bi-directional" (except SYSCLK) are disabled when the 
channel Is gr anted to an external master. All outputs (except MSERR) may be disabled by 
asserting the TEST input. 
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Table A-1 



Signal Summary (continued) 



Signal Name 



Signal Function 



Type(l) 



Synch 
Async 



LOCK 


Lock 


3-State Output 


Synch 


MPGM(1-0) 


MMU Programmable 


3-State Output 


Synch 


MSERR 


Master/Slave Error 


Output 


Synch 


OPT(2-0) 


Option Control 


3-State Output 


Synch 


PDA 


Pipelined Data Access 


3-State Output 


Synch 


PEN 


Pipeline Enable 


Input 


Synch 


PIN169 


Hardware-Development System 


Alignment 


N/A 


PIA 


Pipelined Instruction Access 


3-State Output 


Synch 


PWRCLK 


N/A 


SYSCLK Power 


N/A 


R/W 


Read/Write 
Reset 


3-State Output 
Input 


Synch 


RESET 


Async 


STAT(2-0) 


CPU Status 


Output 


Synch 


SUP/US 


Supervisor/User Mode 


3-State Output 


Synch 


SYSCLK 


System Clock 


Bi-directional 


N/A 


Test 


Test Mode 


Input 


Async 


TRAP(1-0) 


Trap Request 
Warn 


Input 
Edge-Sensitive Input 


Async 


WARN 


Async 



(1 ) The signals labeled "3-state output" and "bi-directional" (except SYSCLK) are disabled when the 
channel is gr anted to an external master. All outputs (except MSERR) may be disabled by 
asserting the TEST input. 
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Figure A-1 Instruction Read— Simple Access 
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Figure A-2 Instruction Read— -Simple Access with IRDY Delayed 
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Figure A-3 Instruction Read— Pipelined Access 
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Figure A-4 Instruction Read— Establishing Burst-Mode Access 
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Figure A-5 Instruction Read— Burst-iVlode Access Suspended by Slave 
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Figure A-6 Instruction Read— Burst-i\Aode Access Preempted by Slave 

J — \ r^^ r 

lies — 



1 or More Cycles 



rs 



-ih 



\ 



-if- 



Address N + 2 



Address N + 2 



IBREQ 



BINV 



VA- 









r 



^^"^^^■■■np'fgWiWgjglW^WgWijW^ M ■ ■rj j ^^^jgjgl glgliliij M 



IRDY 



ERR 



PEN 



IBACK 



:iyy;i^.ros™w" . i!,?.gw ^ ^^^^ 



4h. 



W- 



M^ 



MiM^M^iU<U-kWMJ2UiiU-li-iM! ?!S??^- 






Mm 



m 



A-8 CHANNEL OPERATION TIMING 



Figure A-7 Instruction Read — Burst-Mode Access Suspended by Master 
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Figure A-8 Instruction Read—Burst-Mode Access Suspended by Master and Later 

Preempted by Slave 
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Figure A-9 Instruction Read — Burst-iyiode Access Canceled by Slave"* 
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CHANNEL OPERATION TIMING A-l 1 



Figure A- 10 Instruction Read— Burst-Mode Access Ended by Master (Preempted, 
Terminated, or Canceled) 
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Figure A-1 1 Instruction Read— TLB Miss or Protection Violation 
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Figure A-12 Instruction Read— Pipelined Access with TLB Miss or Protection Violation 
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Figure A-13 Instruction Read— Error Detected by Slave* 
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Figure A-14 Data Read— Simple Access 



SYSCLK 

A(31-0) 

S UP/US 

LOCK 

MPGM(I-O) 

OPT(2~0) 

DREQT(I-O) 

R/W 



DREQ 



J — ^^ r — \ r 



X 



X 



7 



mmmmmmmmrm. 



\ 



PDA 



DBREQ 



BINV 



D(31-0) 



DRDY 



DERR 



PEN 



DBACK 



Address N 



X 



Address N 



X 



y 



L 



7 X 

1 



7 



X"" " X 



\_L 



r~x 






\ 



mrrmmrFfmmrmmrmmmmmmmrmmmrm' 



A-l 6 CHANNEL OPERATION TIMING 



Figure A-15 Data Write — Simple Access 
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Figure A-16 Data Read— Simple Access witii DRDY Delayed 
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Figure A-17 Data Write — Simple Access witii DRDY Delayed 
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Figure A- 18 Data Read Followed by Data Write— -Simple Access 
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Figure A-19 Load and Set Instruction 
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Figure A-20 Data Read — Pipelined Access 
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Figure A-21 Data Write — Pipelined Access 
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Figure A-22 Data Read Followed by Data Write— Pipelined Access (Not Used by 
Processor) 
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Figure A-23 Data Write Followed by Data Read— Pipelined Access 
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Figure A-24 Data Read— Establishing Burst-IMode Access 
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Figure A-25 Data Write— Establishing Burst-Mode Access 
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Figure A-26 Data Read— Burst-Mode Access Suspended by Slave 
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Figure A-27 Data Writers urst-Mode Access Suspended by Slave 
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Figure A-28 Data Read^Burst-Mode Access Suspended by Master (Not Used by 
Processor) 
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Figure A-29 Data Write— Burst-Mode Access Suspended by IMaster (Not Used by 
Processor) 
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Figure A-30 Data Res 
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Figure A-31 Data Write- 
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Figure A-32 Data Read— Burst-Mode Access Suspended by Master and Later Preempted 
by Slave (Not Used by Processor) 
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Figure A-33 Data Write— Burst-Mode Access Suspended by IMaster and Later Preempted 
by Slave (Not Used by Processor) 
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Figure A-34 Data Read— Burst-Mode Access Canceled by Slave* 
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Figure A-35 Data Write — Burst-Mode Access Canceled by Slave* 
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Figure A-36 Data Read— Burst-Mode Access Ended by Master (Preempted, Terminated, 
or Canceled) 
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Figure A-37 Data Write—Burst-Mode Access Ended by Master (Preempted, Terminated, 
or Canceled) 
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Figure A-38 Data Read— TLB Miss or Protection Violation 
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Figure A-39 Data Write— TLB Miss or Protection Violation 
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CHANNEL OPERATION TIMING A.41 



Figure A-40 Data Read— Pipelined Access witii TLB iUiiss or Protection Violation 
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Figure A-41 Data Write— Pipelined Access witii TLB Miss or Protection Violation 
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Figure A-42 Data Read- 
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A-44 CHANNEL OPERATION TIMING 



Figure A-43 Data Write — Error Detected by Slave 
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CHANNEL OPERATION TIMING A.45 



Figure A-44 Channel Transfer from Processor to External Master 
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Figure A-45 Channel Transfer from External Master to Processor 
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Figure B-2 Register Banic Organization 
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Figure B-3 Special Purpose Registers 
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Figure B-3 Special Purpose Registers (continued) 
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Figure B-3 Special Purpose Registers (continued) 
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Special Purpose Registers (continued) 
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Special Purpose Registers (continued) 
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Figure B-4 Special Purpose Registers 
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Register Field Summary 








Label 


Field Name 


Register 


Bit 




ACF 


Accumulator Format 


Floating-Point Environment 
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Bank Protection Bit 


Register Bank Protect 
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Bank 1 Protection Bit 


Register Bank Protect 


1 




82 


Bank 2 Protection Bit 


Register Bank Protect 
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Register Bank Protect 
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Bank 1 2 Protection Bit 
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Register Bank Protect 
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Register Bank Protect 
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Breakpoint Enable 
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Breakpoint Has Occurred 


Instruction Breakpoint Control 0, 1 


12 
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Byte Order 
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Byte Pointer 


ALU Status 
Byte Pointer 
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1-0 1 




BPID 


Breakpoint Process Identifier 


instruction Breakpoint Control 0, 1 


7-0 1 




8RM 


Break ROM 


Instruction Breakpoint Control 0, 1 


9 




BSY 


Break or Synchronize 


Instruction Breakpoint Control 0, 1 


10 1 




8TE 


Break on Translate Enabled 


Instruction Breakpoint Control 0, 1 


8 
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Carry 


ALU Status 


7 




CA 


Coprocessor Active 


Current Processor Status 
Old Processor Status 


15 
15 




CD 


Branch Target Cache Memory Disable 


Configuration 


1 




CE 


Coprocessor Enable 


Channel Control 


31 




CHA 


Channel Address 


Channel Address 


31-0 




CHD 


Channel Data 


Channel Data 


31-0 




CNTL 


Control 


Channel Control 


30-24 1 




CO 


Branch Target Cache Memory Organization 


Configuration 


6 




CP 


Coprocessor Present 


Configuration 


1 




CR 


Load/Store Count Remaining 


Channel Control 
Load/Store Count Remaining 


23-16 
7-0 




CV 


Contents Valid 


Channel Control 







DA 


Disable All Interrupts and Traps 


Current Processor Status 
Old Processor Status 
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DF 


Divide Flag 


ALU Status 


11 
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Table B-1 



Register Field Summary (continued) 



Label 


Field Name 


Register 


Bit 


Dl 


Disable Interrupts 


Current Processor Status 
OW Processor Status 


1 
1 


DM 


Floating-Point Divide By Zero Masl< 


Floating-Point Environment 


5 


DO 


Integer Division Overflow Mask 


Integer Environment 


1 


DS 


Floating-Point Divide By Zero Sticky 


Floating-Point Status 


5 


DT 


Floating-Point Divide By Zero Trap 


ALU Status 


13 


DW 


Data Width Enable 


Configuration 


5 


EE 


Early Load Enable 


Configuration 


7 


FF 


Fast Floating-Point Select 


Floating-Point Environment 


8 


FRM 


Floating-Point Round Mode 


Floating-Point Environment 


7-6 


FC 


Funnel Shift Count 


ALU Status 
Funnel Shift Count 


4-0 
4-0 


FZ 


Freeze 


Current Processor Status 


10 






Old Processor Status 


10 


IBA 


Instruction Breakpoint Address 


Instruction Breakpoint Address 0, 1 


31-2 


IE 


Interrupt Enable 


Timer Reload 


24 


IM 


Interrupt Mask 


Old Processor Status 
Current Processor Status 


3-2 
3-2 


IN 


Interrupt 


Timer Reload 


25 


lO 


Input/Output 


Region Mapping Control 0, 1 
TLB Entry Word 1 


16 



lOP 


Instruction Opcode 


Exception Opcode 


7-0 


IP 


Interrupt Pending 


Current Processor Status 
Old Processor Status 


14 
14 


IRA 


Indirect Pointer A 


Indirect Pointer A 


9-2 


IPB 


Indirect Pointer B 


Indirect Pointer B 


9-2 


IPC 


Indirect Pointer C 


Indirect Pointer C 


9-2 


LA 


Lock Active 


Channel Control 


12 


LK 


Lock 


Current Processor Status 


9 


Old Processor Status 


9 


LRU 


Least-Recently Used Entry 


LRU Recommendation 


6-1 


LS 


Load/Store 


Channel Control 


15 


ML 


Multiple Operation 


Channel Control 


14 


MM 


Monitor Mode 


Current Processor Status 


16 


MO 


Integer Multiplication Overflow Mask 


Integer Environment 





N 


Negative 


ALU Status 


9 


NM 


Floating-Point Invalid Operation Mask 


Floating-Point Environment 





NN 


Not Needed 


Channel Control 


1 


NS 


Floating-Point Invalid Sticky 


Floating-Point Status 





NT 


Floating-Point Invalid Operation Trap 


Floating-Point Status 


8 


OV 


Overflow 


Timer Reload 


26 
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Register Field Summary (continued) 



Label 



Field Name 



Register 



Bit 



PBA Physical Base Address 

PCO Program Counter 

PC1 Program Counter 1 

PC2 Program Counter 2 

PD Physical Addressing Data 

PGM User Programmable 

PI Physical Addressing Instructions 

PID Process Identifier 

PRL Processor Release Level 

PS Page Size 

Q Quotient/Multiplier 

RE ROM Enable 

RGS Region Size 

RM Floating-Point Reserved Operand Mask 

RPN Real Page Number 

RS Floating-Point Reserved Operand Sticky 

RSN Reason Vector 

RT Floating-Point Reserved Operand Trap 

RV ROM Vector Area 

SE Supervisor Execute 

SM Supervisor Mode 

SPCO Shadow Program Counter 

SPC1 Shadow Program Counter 1 

SPC2 Shadow Program Counter 2 

SR Supervisor Read 

ST Set 

SW Supervisor Write 

TCV Timer Count Value 

TE Trace Enable 

TF Transaction Faulted 

TID Task Identifier 

TP Trace Pending 

TR Target Register 

TRV Timer Reload Value 



Region Mapping Address 0, 1 


15-0 


Program Counter 


31-2 


Program Counter 1 


31-2 


Program Counter 2 


31-2 


Current Processor Status 


6 


Old Processor Status 


6 


Region Mapping Control 0, 1 


23-22 


TLB Entry Word 1 


7-6 


Current Processor Status 


5 


Old Processor Status 


5 


MMU Configuration 


7-0 


Configuration 


31-24 


MMU Configuration 


9-8 


Q Register 


31-0 


Current Processor Status 


8 


Old Processor Status 


8 


Region Mapping Control 0, 1 


20-17 


Floating-Point Environment 


1 


TLB Entry Word 1 


31-10 


Floating-Point Status 


1 


Reason Vector 


7-0 


Floating-Point Status 


9 


Configuration 


3 


Region Mapping Control 0, 1 


11 


TLB Entry Word 


11 


Current Processor Status 


4 


Old Processor Status 


4 


Shadow Program Counter 


31-2 


Shadow Program Counter 1 


31-2 


Shadow Program Counter 2 


31-2 


Region Mapping Control 0, 1 


13 


TLB Entry Word 


13 


Channel Control 


13 


Region Mapping Control 0, 1 


12 


TLB Entry Word 


12 


Timer Counter 


23-0 


Current Processor Status 


13 


Old Processor Status 


13 


Channel Control 


10 


Region Mapping Control 0, 1 


7-0 


TLB Entry Word 


7-0 


Current Processor Status 


12 


Old Processor Status 


12 


Channel Control 


9-2 


Timer Reload 


23-0 
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Register Field Summary (continued) 



Label 



Field Name 



Register 



Bit 



TU 


Trap Unaligned Access 


Current Processor Status 
Old Processor Status 


11 
11 


U 
UE 


Usage 
User Execute 


TLB Entry Word 1 

Region Mapping Control 0, 1 
TLB Entry Word 


1 

8 
8 


UM 
UR 


Floating-Point Underflow Mask 
User Read 


Floating-Point Environment 
Region Mapping Control 0, 1 
TLB Entry Word 


3 
10 
10 


US 


Floating-Point Underflow Sticky 


Floating-Point Status 


3 


UT 
UW 


Floating-Point Underflow Trap 
User Write 


Floating-Point Status 
Region Mapping Control 0, 1 
TLB Entry Word 


11 

9 
9 


V 


Overflow 


ALU Status 


10 


VAB 


Vector Area Base 


Vector Area Base Address 


31-10 


VBA 


Virtual Base Address 


Region Mapping Address 0, 1 


31-16 


VE 


Valid Entry 


Region Mapping Control 0, 1 
TLB Entry Word 


14 
14 


VF 


Vector Fetch 


Configuration 


4 


VM 
VS 


Floating-Point Overflow Mask 
Floating-Point Overflow Sticky 


Floating-Point Environment 
Floating-Point Status 


2 
2 


VT 


Floating-Point Overflow Trap 


Floating-Point Status 


10 


VTAG 


Virtual Tag 


TLB Entry Word 


31-15 


WM 


Wait Mode 


Current Processor Status 
Old Processor Status 


7 

7 


XM 


Floating-Point Inexact Result Mask 


Floating-Point Environment 


4 


XS 


Floating-Point Inexact Result Sticky 


Floating-Point Status 


4 


XT 


Floating-Point Inexact Result Trap 


Floating-Point Status 


12 


Z 


Zero 


ALU Status 


8 
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C.I TIMING 

Table C-1 lists the latency of each single- and double-precision floating-point opera- 
tion and each integer multiplication operation. Latency is the minimum time that must 

Table C-1 Latency of Floating-Point and Integer Multiply Operations 











Latency 




Operation (2) 




(Cycles) 


(ns @ 40 MHz) 


CLASS 


(s.p 


,d.p.) 


4 


100 


(1) 


CONVERT 


(int-^s.p.) 


4 


100 




CONVERT 


(int->d.p.) 


4 


100 




CONVERT 


(f.p. 


-> Int) 


3 


75 




CONVERT 


(f.p. 


->f.p.) 


3/4 


75/100 


(3) 


DADD 






3/4 


75/100 


(3) 


DDIV 






18 


450 


(1) 


DEQ 






3 


50 




DGE 






3 


50 




DOT 






3 


50 




DMAC 






9 


225 




DMSM 






9 


225 




DMUL 






6 


150 


(1) 


DSUB 






3/4 


75/100 


(3) 


FADD 






3/4 


75/100 


(3) 


FDIV 






11 


275 


(1) 


FDMUL 






3 


75 


(1) 


FEQ 






3 


50 




FGE 






3 


50 




FGT 






3 


50 




FMAC 






6 


150 




FMSM 






6 


150 




FMUL 






3 


75 


(1) 


FSUB 






3/4 


75/100 


(3) 


MFACC 






3 


75 




MTACC 






3/4 


75/100 


(3) 


SORT 


s.p. 




28 


700 


(1) 




d.p. 




57 


1.425 




MULTIPLU 






3 


75 




MULTIPLY 






3 


75 




MULTM 






3 


75 




MULTMU 






3 


75 





Notes: 1 . Requires additional cycles for wrapping/unwrapping of denormalized input/output 
operands (see Table C-3). 

2. Int = Integer; s.p. = single-precision floating-point; d.p. = double-precision floating-point. 

3. The extra cycle is required for renormalization when the input operands cause massive 
cancellation. 
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elapse after an Instruction Is Issued before its result can be used as an Input operand 
of a subsequent operation. 

Table C-2 lists the repeat time of floating-point operations and integer multiplication 
operations. An Instruction with a repeat time of N can be issued every N cycles. 

Table C-3 shows the effect of denormallzed source operands and results on Instruc- 
tion latency and issue rate. 

If no dependencies or functional unit conflicts exist, then an instruction can be issued. 

C.2 EXCEPTIONS 

In most cases, operations produce non-exceptional results, I.e., results that are 
equal to the infinitely precise result, rounded to the destination format. This section 



Table C-2 



Repeat Time of Floating-Point Operations 



Operation (see note) 



Repeat Time— Start New 

Operation Every N Cycles 

N 



CLASS (s.p.. d.p.) 



CONVERT 


(int^s.p.) 


1 


CONVERT 


(int->d.p.) 


1 


CONVERT 


(f.p. 


-^int) 


1 


CONVERT 


(f.p. 


-^f.p.) 


1 


DADD 






1 


DDIV 






17 


DEQ 






1 


DGE 






1 


DGT 






1 


DMAC 






4 


DMSM 






4 


DMUL 






4 


DSUB 








FADD 








FDIV 






10 


FDMUL 








FEQ 








FGE 








FGT 








FMAC 








FMSM 








FMUL 








FSUB 








MFACC 








MTACC 








SORT 


s.p. 




27 




d.p. 




56 


MULTIPLU 






1 


MULTIPLY 






1 


MULTM 






1 


MULTMU 






1 


Notes: int = integer 






s.p. = single- 


precision floating-point 




d.p.« double 


i-precision floating-point 
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Table C-3 



Effect on Latency of Denormalized Source Operands or Results 



Denormallzed Operand Status 



Latency Increase (1) 
Cycles (ns(§>40MHz) 


+4 


+160 


+5 


+200 


+4 


+160 (2) 


+6 


+240 



One denormalized source operand 

Two denormalized source operands 

Denormalized result 

Two denormalized source operands following 
an operation with a denormalized result 



Notes: 1. Only the instruction CLASS, FMUL. DMUL, FDMUL, FDIV. DDIV, and SORT require extra 
cycles to process denormalized numbers. Denormalized number processing uses the ad- 
der, increasing by one cycle per denorm the latency of any instruction being issued to the 
adder at the same time. 

2. Unwrapping of denormalized results is pipelined with other operations. 



describes results produced in exceptional cases, as well as other details pertaining to 
the floating-point implementation. 

The following terms are used In the classification of exceptions: 

oo Infinity, a floating-point number comprising a maximum biased 

exponent, a zero fraction, and a sign bit of 1 or 0. +oo indicates a 
positive infinity, -^ a negative infinity. 

Zero, a floating-point number comprising a biased exponent, a zero 

fraction, and a sign bit of 1 or 0. +0 Indicates a zero with a sign bit 
of 0; -0 indicates a zero with a sign bit of 1 . 

AQNaN An AMD Quiet Not-a-Number comprising a maximum biased expo- 

nent, a fraction of 1 1000... 0, and a sign bit of 0. AQNaN is the only 
NaN reported as an operation result. 

Denorm A denormalized floating-point number; a non-zero number that is 

too small to be represented as a normalized floating-point number. 

FNum A finite, non-zero floating-point number. +FNum indicates a positive 

FNum; -FNum indicates a negative FNum. 

10 Integer zero, an integer word consisting entirely of zeros. 

IMaxNeg The largest negative number representable in 32-bit, 2's- 

complement integer format. IMaxNeg has a value of 80000000, 
hexadecimal. 

IMaxPos The largest positive number representable in 32-bit, 2's-comple- 

ment Integer format. IMaxPos has a value of 7fffffff, hexadecimal. 

Inexact Result An exception indicating one of the following: 

• A rounded result of an operation not equal to the infi- 
nitely-precise result; 

• An overflowed operation with the overflow exception 
trap disabled (VM = 1 ); or 

• In fast-float mode, a non-zero intermediate result 
converted to a final result of zero. 
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Infinitely Precise Result The result of an operation, computed as if the exponent 
range and precision were unbounded. 



The result of an operation before rounding. For the purpose 
of describing exception handling, the internnediate result 
can be thought of as being equal to the Infinitely-precise 
result. 

An exception indicating that the source operand or oper- 
ands are invalid for the operation to be perfornned, e.g., the 
operation «> times 0. 

The largest representable finite floating-point number. 
+Max indicates the largest positive finite number, -Max the 
largest negative finite number. 

A non-zero floating-point number. A NonZ can be either an 
FNum or an infinity. +NonZ indicates a positive NonZ, 
-NonZ a negative NonZ. 

An exception indicating that the rounded result of an opera- 
tion is too large to be expressed in the destination format. 

An exception indicating that an operation producing a 
numeric result has a reserved operand (NaN) as either a 
source operand or result. 

A result produced by rounding the infinitely-precise result. 

The sign of operand x. 

The largest representable, 32-blt, unsigned integer quan- 
tity. UIMax has a value of ffffffff, hexadecimal. 

An exception indicating that the rounded result of an opera- 
tion is too small to be represented in the destination format. 
There are two different sets of underflow criteria, depend- 
ing on whether or not the underflow trap or fast-float mode 
is enabled: 

Underflow trap masked and fast-float mode disabled 
(UM = 1 and FF = 0): An operation result underflows if a 
non-zero intermediate result is too small to be represented 
as a normalized number and the rounded result is inexact. 

Underflow trap unmasked or fast-float mode enabled 
(UMsO or FF=1): An operation result underflows if a 
non-zero intermediate result is too small to be represented 
as a normalized number. 

The tables in Sections C.2.1 through C.2.12 list the exception classes relevant to 
each floating-point operation, and the results and exception status reported for a 
variety of conditions. The following shorthand is used to describe the status bits set in 
the floating-point status register: 



Intermediate Result 

Invalid Operation 

Max 

NonZ 

Overflow 
Reserved Operand 

RResult 
Sign (X) 
UIMax 

Underflow 



Notation 


Status bits 


N 


NS,NT 


R 


RS, RT 


V 


VS,VT 


U 


US, UT 


X 


XS,XT 


D 


DS, DT 
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Note that a sticky status bit (NS, RS, VS, US, XS, or DS) is set only if the correspond- 
ing exception mask bit in the Floating-Point Environment Register Is set, except when 
a sticky status bit is set by a DMAC, DMSM, FMAC, FMSM, or MTACC instruction, 
and that the state of the trap status bits (NT, RT, VT, UT, XT, or DT) is valid only if a 
Floating-Point Exception trap Is taken by the operation in question. 

In most cases, exceptional conditions have been divided Into two groups: Input excep- 
tions, for which the exception is due to inappropriate operands, and output excep- 
tions, for which the exception can be detected only at the conclusion of an operation. 

In the tables that follow, exceptions are prioritized in the following order, from the 
highest to lowest priority: 

1 . Invalid operation, reserved operand <- highest priority 

2. Divide by Zero 

3. Overflow, Underflow 



4. Inexact Result 



<- lowest priority 



C.2.1 



The result and status for a given exceptional operation are determined by the highest- 
priority exception. If, for example, an operation produces both overflow and Inexact 
result exceptions, the overflow exception, having higher priority, determines the be- 
havior of the operation. The behavior of this operation is therefore described by the 
Overflow entry of the Output Exception table for the operation In question. 

The tables that follow list some cases that do not result in a status bit being set. 
These cases are not considered exceptional by the IEEE Binary Floating-Point Stan- 
dard, and are listed here merely for the sake of completeness. 



Addition (FADD, DADD) 



Input Exceptions: FADD, DADD | 


SRCA 


SRCB 
SNan 1 QNaN | +oo | -oo | FNum, 


SNan 


AQNaN i AQNaN | AQNaN j AQNaN j AQNaN 
N,R 1 N,R 1 N,R 1 N,R j N.R 


QNaN 


AQNaN [" AQNaN [" AQNaN [" AQNaN f" AQNaN 
N,R [_ R [_ R U ^ U ^ 


+00 


AQNaN 1 AQNaN j +oo 1 AQNaN [ +oo 
N,R R none R none 


— oo 


AQNaN j AQNaN 1 AQNaN I -oo 1 -co 


FNum.O 


AQNaN 1 AQNaN 1 +oo 1 -oo I 
N,R j R j none j none [ 

1 1 1 1 ' ,. 
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C-5 



Output Exceptions: FADD, DADD | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 


sign + 


FRM =, +00 


+00 


V.X 


FRMO, -^ 


+Max 


v,x 


sign- 


FRM =, -00 


—00 


v,x 


FRMO, ~oo 


-Max 


V.X 


VM = p 


Exact Result 


(NW) 


V 


Inexact Result 


(NW) 


v,x 


Underflow 


UM = 1 


FF = 1 




±0(1) 


u,x 


FF = 




N/A 


N/A 


UM = 


FF=:1 




(NW) 


U,X 


FF = 


Exact Result 


(NW) 


U 


Inexact Result 


N/A- 


N/A 


Inexact Result 


XM-1 


RResult 


X 


XM = 


(NW) 


X 



Notes: N/A = Not applicable; addition cannot underflow for these conditions. 
(NW) = Result not written; contents of destination register unchanged. 
(1) = Zero has sign of intermediate result. 
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Subtraction (FSUB, DSUB) 



Input Exceptions: FSUB, DSUB | 


SRCA 






SRCB 








SNan 


QNaN 


+- 


1 —CO 


j FNum, 


SNan 


AQNaN 


AQNaN 


AQNaN 


! AQNaN 


! AQNaN 




N.R [_ 


N.R 


[_ N,R 


L_ N,R 


L N,R 


QNaN 


AQNaN 1 


AQNaN 


1 AQNaN 


1 AQNaN 


1 AQNaN 




AQNaN 1 


R 


Jh-JL__ 

1 AQNaN 


1 +00 


_L — 5. — 

1 +00 


+00 


AQNaN 




N.R l_ 
AQNaN 1 


R 


1 —00 


1 none 
1 AQNaN 


' none 
1 —00 




AQNaN 




N,R__ ' 

AQNaN 1 


R 


1 none 


j NJR_ 

1 +00 


[_ none 


FNum.O 


AQNaN 




N,R j 


R 


j none 


j none 
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Output Exceptions: FSUB, DSUB | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 


sign + 


FRM =, +00 


+00 


v,x 


FRMO,-oo 


+Max 


v,x 


sign- 


FRM =.-00 


—00 


v,x 


FRMO, +00 


-Max 


v.x 


VM = 


Exact Result 


(NW) 


V 


Inexact Result 


(NW) 


V.X 




UM = 1 


FF = 1 




±0(1) 


U.X 


FF = 




N/A 


N/A 


UM = 


FF = 1 




(NW) 


u,x 


FF = 


Exact Result 


(NW) 


U 


Inexact Result 


N/A 


N/A 


Inexact Result 


XM=1 


RResult 


X 


XM = 


(NW) 


X 



Notes: N/A = Not applicable; addition cannot underflow for these conditions. 
(NW) = Result not written; contents of destination register unchanged. 
(1) = Zero has sign of intermediate result. 
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Multiplication (FMUL, DMUL, FDMUL) 



Input Exceptions: FMUL, DMUL FDMUL | 


SRCA 


SRCB 




SNaN j QNaN [ +00 j -^ \ ^ \ f'Num 


SNan 


AQNaN 1 AQNaN 1 AQNaN 1 AQNaN 1 AQNaN 1 AQNaN 




N.R 1 N,R 1 N,R 1 N.R 1 N.R 1 N,R 


QNaN 


AQNaN 1 AQNaN | AQNaN | AQNaN | AQNaN | AQNaN 


+00 


N,R 1 R 1 R 1 R 1 R 1 R 
AQNaN 1 AQNaN | +00 | -00 | AQNaN | +00 




N,R 1 R 1 none 1 none 1 N.R 1 none 


CX3 


AQNaN 1 AQNaN | -<« 1 +00 | AQNaN | -00 




N,R 4. ^ ! "^"® L "°"® L ''^ 1 "°"® 
AQNaN 1 AQNaN | AQNaN | AQNaN | ±0(1) | ±0(1) 







N,R 1 R 1 N,R 1 N,R 1 none 1 none 


FNum.O 


AQNaN 1 AQNaN t ±^W T ±^(1) T ±0(1) T 




N,R 1 R 1 none 1 none 1 1 
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Output Exceptions: FMUL, DMUL | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 


sign + 


FRM =, +00 


+00 


v.x 


FRM 0,-00 


+Max 


V.X 


sign- 


FRM =, -00 


—00 


v,x 


FRM 0. -00 


-Max 


v,x 


VM = 


Exact Result 


(NW) 


V 


Inexact Result 


(NW) 


v,x 


Underflow 


UM = 1 


FF = 1 




±0(1) 


U.X 


FF = 




RResult 


u,x 


UM = 


FF = 1 




(NW) 


U.X 


FF = 


Exact Result 


(NW) 


U 


Inexact Result 


(NW) 


u,x 


Inexact Result 


XM = 1 


RResult 


X 


XM = 


(NW) 


X 



Notes: (NW) = Result not written; contents of destination register unchanged. 
(1) = Zero has sign of intermediate result. 



The operation FDMUL produces no output exceptions. 



C.2.4 


Division (FDIV 


, DDIV) 












Input Exceptions: FDiV, DDiV | 


SRCA 
(dividend) 


SNaN { 


QNaN 


SRCB (divisor) 

1 00 









FNum 


SNan 


AQNaN 1 
« N,R 1 


AQNaN 
N,R 


1 AQNaN 
L «N.R__ _ 


L.. -., 


AQNaN 
N.R 


Im.. ,',„ 


AQNaN 
N.R 


QNaN 
00 




AQNaN 1 
N,R 1 

AQNaN r 
N,R 1 

AQNaN 1 
N,R 1 


AQNaN 
R 

AQNaN 
R 

AQNaN 
R 


1 AQNaN 
1 R 

r AQNaN 
1 R 

1 ±0(1) 
1 none 




AQNaN 
R 

" ±00 (1) 
none 

' AQNaN 
N.R 


-1 

-1 


AQNaN 
R 

"~±^liy 

none 

±0(1) 
none 


FNum 


AQNaN r 
N,R 1 


AQNaN 
R 


r ±0(1) 
1 none 


1 


±00 (1) 
D 


1 





Note: (1 ) Result sign is XOR of sign(SRCA) and sign(SRCB) 
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Output Exceptions: FDIV, DDIV | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 


slgn + 


FRM =, +00 


+00 


V.X 


FRM 0. -00 


+Max 


v,x 


sign- 


FRM =. -00 


—00 


v,x 


FRM 0, +00 


-Max 


v,x 


VM = 


Exact Result 


(NW) 


V 


Inexact Result 


(NW) 


v,x 


UnderFlow 


UM = 1 


FF = 1 




±0(1) 


u,x 


FF = 




RResult 


u.x 


UM = 


FF = 1 




(NW) 


u.x 


FF = 


Exact Result 


(NW) 


u 


Inexact Result 


(NW) 


u.x 


Inexact Result 


XM = 1 


RResult 


X 


XM = 


(NW) 


X 



Notes: (NW) = Result not written; contents of destination register unchanged. 
(1 ) = Zero lias sign of intermediate result. 
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C.2.5 



Comparison (FEQ, DEQ, FGE, DGE, FGT, DGT) 



Input Exceptions: FEQ, DEQ | 


SRCA 


SRCB 
SNaN 1 


QNaN 




oo, FNum, 


SNan 


FALSE 1 
N ] 


FALSE 
N 




FALSE 
N 


QNaN 


FALSE 1 
N i 


FALSE 
none 




FALSE 
none 


oo, FNum, 


FALSE 1 
N 1 


FALSE 
none 







Input Exceptions: FGE, DGE, FGT, DGT | 


SRCA 


SRCB 
SNaN QNaN 


oo, FNum, 


SNan 
"^QNalsj 


FALSE FALSE 
N j N j 

FALSE 1 FALSE | 
N 1 N i 


FALSE 

Jj 

FALSE 
N 


oo, FNum, 


FALSE j FALSE j 
N 1 N i 



Floating-point comparison operations produce no output exceptions. 



C.2.6 Multiply-Accumulate (FMAC, DUAC), Multiply-Sum 

fFIASM, DMSAi) 



Input Exceptions: FMAC, DMAC, FMSM, DMSM | 


0P1 0P2 0P3 


Result Status 


one or more operands is an SNan 

no SNaNs, one or more operands is a QNaN 


AQNaN N,R 
AQNaN R 


(oo * 0) X 


AQNaN N,R 


(+NonZ * +oo) or (-NonZ * -oo) 1 -<« 
(-NonZ * +oo) or (+NonZ * -oo) j +oo 


AQNaN N,R 
AQNaN N,R 


(+NonZ * +oo) or (-NonZ * -oo) j +oo 
(-NonZ * +oo) or (+NonZ * -oo) \ -^ 
(+NonZ * +oo) or (-NonZ * -oo) 1 FNum 
(-NonZ * +oo) or (+NonZ * -oo) I FNum 


+00 none 
-oo none 
+00 none 
-oo none 


FNum, 1 FNum, j +oo 
FNum, j FNum, j +oo 


+00 none 
+00 none 



Notes: X = don't care. 

OP1 and OP2 are commutative, i.e.. (A * B) will produce results identical to (B ' 



A). 
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Output Exceptions: FMAC, DMAC, FMSM, DMSM | 


Exception 


Conditions 


Result 


Status 


Overflow 


sign + 


FRM =, +00 


+00 


V,X 


FRM 0, -00 


+Max 


v,x 


sign- 


FRM =, -00 


—00 


V,X 


FRM 0, -00 


-Max 


v,x 


Underflow (2) 




±0(1) 


u.x 



Notes: 1. Zero has sign of intermediate result. 

2. The underflow criterion for these operations is the same as that for fast float mode; an operation result 
underflows if a non-zero intermediate result is too small to be represented as a normalized number. 



Multiply-accumulate based operations— FMAC/DMAC and FMSM/DMSM— do not 
support gradual underflow. Denormalized input operands are converted to a zero of 
the same sign, and underflowed results are converted to a zero having the sign of the 
intermediate result. 

For multlply-accumulate-based operations, the contents of special registers IPA, IPB, 
IPC, and the Exception Opcode Register may not reflect the operands and opcode of 
the faulting instruction after a Floating-Point Exception trap is taken. 



C.2.7 



Square Root (SORT) 



Input Exceptions: SQRTF | 


SRCA: 




SNan 


AQNaN 
N,R 


QNaN 


AQNaN 
R 


+00 


+00 
none 


-FNum, -00 


AQNaN 
N.R 


+0 


+0 
none 


-0 


-0 
none 
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Output Exceptions: SORT | 


Exception 


Conditions 


Result 


Status 


Inexact 
Result 


XM=1 


RResult 


X 


XM=0 


(NW) 


X 



Note: (NW) = Result not written; contents of designation register unchanged. 



C.2.8 



Floating-Point-to-Floating-Point Conversions (CONVERT) 



Input Exceptions: 
CONVERT, f.p. -> f.p. 


SRCA: 


SNan 


AQNaN 
N.R 


QNaN 


AQNaN 
R 


oo 


±00 (1) 
none 





±0(1) 
none 



Note: (1 ) = Result has sign of operand. 
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Output Exceptions: CONVERT, f.p. -> f.p. | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 


sign + 


FRM =, +00 


+00 


v,x 


FRM 0. -00 


+Max 


v.x 


sign - 


FRM =. -00 


—00 


v.x 


FRM 0. +00 


-Max 


v,x 


VM = 


Exact Result 


(NW) 


V 


Inexact Result 


(NW) 


V.X 


UnderFlow 


UM = 1 


FF = 1 




±0(1) 


U,X 


FF = 




RResult 


u,x 


UM = 


FF = 1 




(NW) 


U,X 


FF = 


Exact Result 


(NW) 


U 


Inexact Result 


(NW) 


u.x 


Inexact Result 


XM = 1 


RResult 


X 


XM = 


(NW) 


X 



Notes: (NW) = Result not written; contents of destination register unchanged. 
(1) = Zero has sign of intermediate result. 



C.2.9 Integer-to-Floating-Point Conversions (CONVERT) 

Input Exceptions: Integer-to-floatlng-point conversions produce no input exceptions. 



Output Exceptions: CONVERT, Integer -> f.p. | 


Exception 


Conditions 


Result 


Status 


Inexact Result 


XM=1 


RResult 


X 


XM=0 


(NW) 


X 



Note: (NW) » Result not written; contents of designation register unchanged. 



FLOATING-POINT BEHAVIOR C-1 3 



C.2.10 



Floating-Point-to-lnteger Conversions (CONVERT) 



Input Exceptions: 

CONVERT, f.p. -> Signed integer 


SRCA: 


SNan 


10 
N,R 


QNaN 


10 
N,R 


+00 


IMaxPos 
N 


—00 


IMaxNeg 
N 


+0 


10 
none 


-0 


10 
none 



input Exceptions: 

CONVERT, f.p. -^ unsigned integer 


SRCA: 


SNan 


10 
N,R 


QNaN 


10 
N,R 


+00 


UIMax 
N 


—00 


10 
N 


+0 


10 
none 


•-0 


10 
none 


-FNum 


10 
N 



Output Exceptions: CONVERT, f.p. -^ signed integer | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 


slgn + 


IMaxPos 


N 


slgn- 


IMaxNeg 


N 


VM = 


Exact Result 


(NW) 


N 


Inexact Result 


(NW) 


N 


Inexact Result 


XM = 1 


RResult 


X 


XM = 


(NW) 


X 



Note: (NW) => Result not written; contents of designation register unchanged. 



Output Exceptions: CONVERT, f.p. -^ unsigned integer | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM = 1 




UIMax 


N 


VM = 


Exact Result 
Inexact Result 


(NW) 
(NW) 


N 
N 


Inexact Result 


XM = 1 


RResult 


X 


XM = 


(NW) 


X 



Note: (NW) s Result not written; contents of designation register unchanged. 
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C.2.1 1 



Move From Accumulator (MFACC) 



Input Exceptions: MFACC 


SRCA: 


QNaN 


AQNaN 
R 


oo 


±00 (1) 
none 





±0(1) 
none 



Note: (1) = Result has sign of operand. 



Output Exceptions: MFACC | 


Exception 


Conditions 


Result 


Status 


Overflow 


VM=1 


slgn + 


FRM =, +00 


+00 


v,x 


FRM 0. -00 


+Max 


V,X 


sign - 


FRM =, -00 


—00 


v,x 


FRM 0, +00 


-Max 


v,x 


VM=0 


Exact Result 


(NW) 


V 


Inexact Result 


(NW) 


v.x 


Underflow 


UM=1 


FF=1 


±0(1) 


U.X 


FF=0 


RResult 


U,X 


UM=0 


Exact Result 


(NW) 


U 


Inexact Result 


(NW) 


U,X 


Inexact Result 


XM=1 
XM=0 


RResult 
(NW) 


X 
X 



Notes: (NW) = Result not written; contents of destination register unchanged. 
(1) = Zero has sign of intermediate result. 
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C.2.12 



Move To Accumulator (MTACC) 



Input Exceptions: MTACC 


SRCA: 


SNaN 


AQNaN 
N.R 


QNaN 


AQNaN 
R 


oo 


±00 (1) 
none 





±0(1) 
none 



Note: 1 . Result has sign of operand. 



Output Exceptions: l\/ITACC | 


Exception 


Conditions 


Result 


Status 


Overflow 


sign + 


FRM =, +00 


+00 


v,x 


FRM 0. -00 


+Max 


v,x 


sign- 


FRM =, -co 


—00 


v,x 


FRM 0, -00 


-Max 


v,x 


Underflow (1) 






±0(2) 


u,x 


Inexact Result 






RResult 


X 



Notes: 1 . Underflow is detected only at the output of the operation; denormallzed inputs are not flushed to zero. The 
output underflow detection criterion is the same as for fast float mode; an operation result underflows if a 
non-zero intermediate result is too small to be represented as a normalized number. 
2. Zero has sign of intermediate result. 



C.2.1 3 Classify (CLASS) 

The GLASS operation does not produce exceptions. 



C.2.14 Integer Multiply (MULTIPLY, MULTIPLU, MULTM, MULTMU) 

Integer multiplication operations MULTIPLY, MULTIPLU, MULTM, and MULTMU do 
not affect the ALU Status Register or the Floating-Point Status Register. 

For the MULTIPLY and MULTIPLU instructions, overflow of the 32-bit result can be 
detected by trapping on overflow. When the Integer Multiply Overflow Mask bit Is 0, 
the MULTIPLY Instruction causes an Out of Range trap when it produces a signed 
result that exceeds 32 bits (a positive number larger than 7fffffff, hexadecimal, or a 
negative number smaller than 80000000, hexadecimal). Similarly, the MULTIPLU 
Instruction causes an Out of Range trap when it produces an unsigned result that 
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exceeds 32 bits (a positive number greater than ffffffff, liexadecimal). The MULTM 
and MULTMU instructions cannot overflow, and are unaffected by the MO bit. 

C.2.1 5 Integer Divide (DIVIDE, DIVIDU) 

Integer division operations DIVIDE and DIVIDU do not affect the ALU Status Register 
or the Floating-Point Status Register. Each produces a quotient QUOT and remainder 
REM such that Euclid's Equation is always satisfied for non-exceptional cases, that is: 

Dividend = (Divisor • QUOT) + REM 

If QUOT is non-zero, its sign is the exclusive-OR of the signs of the dividend and 
divisor. If the infinitely-precise quotient cannot be expressed as an integer. It is trun- 
cated toward zero. That is, QUOT is the Integer closest to and no greater in magni- 
tude than the infinitely-precise result. 

If REM is non-zero, it has the sign of the dividend. 

DIVIDE and DIVIDU always take the Out of Range trap when the divisor Is 0; QUOT 
and REM are undefined. 

Overflow of the 32-bit quotient can be detected by trapping on overflow. When the 
Integer Divide Overflow Mask bit is 0, the DIVIDE instruction causes an Out of Range 
trap when it produces a signed quotient that exceeds 32 bits (a positive number larger 
than 7fffffff, hexadecimal; or a negative number smaller than 80000000, hexadeci- 
mal). Similarly, the DIVIDU instruction causes an Out of Range trap when it produces 
an unsigned result that exceeds 32 bits (a positive number greater than ffffffff, hexa- 
decimal). QUOT and REM are undefined for an overflowing integer divide, regardless 
of whether overflow trapping is enabled. 

Note that this behavior is generated by the DIVIDE and DIVIDU instruction emulation 
software. 



C.3 TRAPS 

The following floating-point instructions take the Floating-Point Exception trap (vector 
number 0x16) upon producing an unmasked exception: 

CONVERT DMUL FGE 

DADD DSUB FGT 

DDIV FADD FMUL 

DEO FDIV FSUB 

DGE FDMUL MFAGC 

DGT FEQ SORT 

The instructions FMAG, DMAC, FMSM, DMSM, and MTACC do not take the Floating- 
Point Exception trap upon producing an unmasked exception. 

The time at which a floating-point exception trap Is taken depends on the type of the 
exception causing the trap. The Invalid Operation, Reserved Operand, and Divide by 
Zero exceptions cause a trap to be taken after the first cycle of the execute stage, 
since they can be determined at the beginning of an operation. The Overflow, Under- 
flow, and Inexact Result exceptions cause a trap to be taken after the last cycle of the 
execute stage. This timing is characteristic of the Am29050 microprocessor hardware 
implementation; other 29K Family processors may exhibit different trap timing. 

FLOATING-POINT BEHAVIOR C-17 



A Floating-Point Exception trap cannot be caused by writing to tiie Floating-Point 
Environment Register or the Floating-Point Status Register. For example, it is not 
possible to cause a floating-point exception trap by unmasking a currently set 
exception. 

When the DA bit of the Current Processor Status Register is 1 , any arithmetic excep- 
tion that would otherwise produce a Floating-Point Exception trap or Out of Range 
trap will instead cause a Monitor trap. In all other respects, however, the processor 
behaves as described in Section 3.5.10. 
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INDEX 



U 



A (Absolute), 8-7 

A(31-0) (Address Bus), 1-4, 5-1 

Access privilege, 5-20 

Access protocol, 2-17, 5-8 

Access, burst- mode, 1-4 

Access, simple, 2-18 

Access, simultaneous, 5-19 

Activation record, 7-1 

Activation record mapping, 7-3, 7-7 

ADD, 7-39 

Addition, integer, 7-18 

Address Bus (A(31-0)), 1-4, 5-1 

Address Bus, coprocessor operations, 6-8 

Address Bus, shared, 2-18 

Address Space, Coprocessor, 2-10 

Address Space, Input/Output, 2-10 

Address Space, Instruction ROM, 2-10 

Address Space, Instruction/Data, 2-10 

Address Tag, 4-8, 4-9 

Address transfer, 2-18 

Address translation, 2-12, 4-22-4-23, 7-32 

Address translation exceptions, 1-6 

Address Unit. 2-1 5,4-11, 4-1 4 

Address, physical, 2-10 

Address, virtual, 2-10 

Addresses, pipelined, 1-4 

Addressing, 2-10,4-12 

Addressing, indirect, 7-16 

Addressing, register, 4-1 2 

ADRF Latch, 4-14.4-15 

Alignment, 2-10 

Alignment, Branch Target Cache 

memory, 4-9 
Alignment, bytes, 7-27 
ALU (Arithmetic/Logic Unit), 2-15, 4-12, 

4-18,8-4 
ALU Status Register, 2-4, 8-4 
Am29050 microprocessor, 1 -2 
Am29050 microprocessor features, 1-1 
Am29050 microprocessor special 

features, 1-11 
Applications, 7-16 
Arbitration, 2-18, 5-6,5-18 
Arguments, incoming, 7-3, 7-9 
Arguments, outgoing, 7-3, 7-9 
Arithmetic operation, 8-4 



Arithmetic/Logic Unit (ALU), 2-15, 

4-18,8-4 
ASEQ, 7-26 
ASNE, 7-17 
Assembler syntax, 8-4 
Assert compare, 7-17 

B-Bus ,4-11 

BGRT (Bus Grant), 5-1 , 5-18 

BINV(Bus Invalid), 4-23, 5-1, 5-18 

Boolean, 7-22 

Boolean FALSE, 7-22 

Boolean TRUE, 7-22 

Boundary crossings, 4-9 

Branch displacement, relative, 7-26 

Branch Target, 4-14 

Branch Target Cache memory, 1-5, 2-14, 

4-3,4-5,4-16,7-33,7-34 
Branch Target Cache memory disable 

(CD), Configuration Reg.. 4-7, 7-34 
Branch Target Cache memory lookup 

process, 4-7-4-8 
Branch, relative, 1-5, 2-5, 4-9, 7-38, 7-39 
Branches, immediately adjacent, 7-39 
BREQ (Bus Request), 5-1 . 5-18 
Burst, 5-11 

Burst mode. 4-14, 5-11,5-13. 5-14. 5-24 
Burst mode access. 1 -4, 5-8, 5-1 1 ,i 5-14 
Burst mode access protocol, 2-17 
Burst mode cancellation, 5-16 
Burst mode preemption, 5-16 
Burst mode termination, 5-16 
Bus Grant (BGRT), 5-1, 5-18 
Bus Invalid (BINV), 5-1, 5-18 
Bus Request (BREQ), 5-1 , 
Bus sharing, 5-19 
Byte alignment, 7-27 
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C (Carry) ALU Status Reg., 8-1 , 8-4 

CA (Coprocessor Active), 6-4 

Cache Block, 4-9 

Cache Disable (CD), 4-7, 7-34 

Cache replacement, random, 4-8 

Cache tag, 4-6 

Cache-block boundary, 4-6 
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CALL, 7-38 

Call, large range, 7-25 

Calls, operating system, 7-17 

Carry (C), ALU Status Reg.. 7-18, 8-4 

CD (Cache Disable), 4-7, 7-34 

CDA (Coprocessor Data Accept), 5-4, 6-6 

CDA sequencing, 6-7 

CE (Coprocessor Enable) Channel Control 

Reg., 6-2 
CE/CNTL, 8-7 
Channel, 2-17, 5-6 
Channel Address (CHA), Channel Addr. 

Reg., 3-13, 7-34 
Channel Control, 3-1 4, 7-34 
Channel Data (CHD), Channel Data Reg., 

3-13, 7-34 
Character detection, 7-27 
Character-string, 7-26, 7-27 
Clock synchronization, 5-31 
Clock, processor-generated, 5-31 
Clock, system-generated, 5-32 
Clocks, 2-19 
CNTL(I-O) (CPU Control), 5-5. 5-22, 5-23, 

5-24-5-25, 5-26, 5-30, 
Compare Bytes (CPBYTE), 7-27 
Compiler, optimizing, 1-8 
Compiler's run-time stack, 1-4-1-5 
Compilers, 1-8 

Complementing a Boolean, 7-22 
Configuration Register, 2-3, 4-7, 7-34 
CONST, 7-25, 7-38, 7-39 
Constant, 32-bit, 7-25 
Constant, 8-bit, 2-5 
CONSTH, 7-25 
CONSTN,7-25 
Contents Valid (CV), Channel Control 

Reg.. 7-35 
Context switching, 2-1 1 
Context switching, temporary, 2-1 1 
Contexts, saving and restoring, 2-1 1 
Coprocessor, 6-1 
Coprocessor Active (CA), 6-4 
Coprocessor attachment, 2-19 
Coprocessor communicati on, 6- 7 
Coprocessor Data Accept (CDA), 5-4, 6-6 
Coprocessor Enable (CE), Channel 

Control Reg., 6-2 
Coprocessor exception, 6-3, 6-8 
Coprocessor exception trap, 6-8 
Coprocessor Interupts, 6-4 
Coprocessor Load/Store, 6-2 
Coprocessor operations, 6-1 
Coprocessor Present (CP), Configuration 

Reg., 6-4 



Coprocessor transfer, 5-3, 6-1, 6-2, 6-3, 

6-5, 6-7 
COUNT, 8-1 
CP (Coprocessor Present) Configuration 

Reg., 6-4 
CPBYTE (Compare Bytes), 7-27 
CPNEO, 7-38 
CPU Control (CNTL(I-O)), 5-5, 5-22, 5-24, 

5-26, 5-30, 
CPU Status (STAT(2-0)), 5-4, 5-21, 5-22, 

5-24, 5-25, 5-30 
Current Processor Status, 2-2 
Current Processor Status Register, 3-78 
CV (Contents Valid) Channel Control 

Reg.. 7-35 
Cycle time. 1 -2 
D(31-0) (Data Bus), 1-4, 5-3 
DA (Disable All Interrupts). 5-29 
Daisy chain, 2-18 
Data access, 5-7 

Data access exception trap, 5-7-5-8 
Data Access request, 5-6 
Data accesses, external. 2-9 
Data Address Transfer. 5-6 
Data blocks, movement o f large, 7 -27 
Data Burst Acknowledge (DBACK), 5-3, 

5-8,5-14,5-15, 5-16 
Data Burst Request (DBREQ), 5-3, 5-8, 

5-14,5-15 
Data Bus (D(31-0)), 1-4,5-2 
Data depen dencie s, pipeline, 4-1 2 
Data Error (DERR), 5-3. 5-7, 5-14 
Data formats, 2-8 
Data forwarding, 4-13 
Data Ready (DRDY), 5-3, 5-10-5-1 1.5-14 
Data Request (DREG), 5-3, 5-10, 

5-16-5-17 
Data Request Type (DREQT(I-O)), 5-3, 

5-7, 6-5 
Data transfer, 5-6 
Data types, 2-9 

Data-flow organization, 1-6-1-7 
Data-unit numbering conventions, 2-9 
DBACK (Data Burst Acknowledge), 5-3, 

5-8, 5-15,5-16 
DBREQ (Data Burst Request), 5-3, 5-8, 

5-14,5-15 
Decode PC Register, 4-14, 4-15 
Decode stage, 4-2 

Delay cycle, indirect addressing, 7-41 
Delayed branch, 7-38-7-39 
Delayed effects, registers, 7-41 
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Dema nd paging, 7-31-7-32 

DERR (Data Error), 5-3, 5-7-5-8, 5-14 

DEST, 8-1 

Dl (Disable Interrupts), 5-29, 6-5 

DIVIDE, 7-16 

Divide instructions, 7-19 

DIVIDU,7-16 

Double -precision floating-point, 3-45 

DRDY (Data Ready), 5-3, 5-10-5-1 1,5-14 

DREQ (Data Request), 5-10, 5-16-5-17 

DREQT(I-O) (Data Request Type), 5-3, 

5-7. 6-5 
DTR,4-13 

EMULATE, 7-16 

ETR, 4-13,4-14 

Exceptions, address translation, 1-6 

Execute stage, 4-2 

Executing mode, 2-16 

Execution Unit, 2-15, 4-1 , 4-12 

External access, 7-34 

External access protection, 7-28 

External interrupts, 5-29 

External traps, 5-29 

EXTERNAL WORD[n]. 8-1 

FALSE, 8-1 

FC (Funnel Shift Count) Funnel Shift 

Count Reg., 7-27, 8-1 
Fetch-Ahead Adder, 4-10 
Fetch-Ahead Adder overflow, 4-10 
Fetch special instruction, 4-1 6 
Fetch stage, 4-2 
Field Shift Unit, 2-15, 4-19 
FIFO, 4-3 

Freeze (FZ), 4-11, 7-29, 8-4 
Funnel-Shift Unit, 4-19 
FZ (Freeze bit), 4-1 1 , 7-30, 8-4 

General-purpose registers, 1-4, 2-1, 2-2 
Generator, register address, 4-13 
Global registers, 4-12 

Halt, 5-4 

Halt mode, 2-17, 5-21 , 5-22, 5-25 
Hardware development system, 2-18, 5-22 
Hardware testing, 5-28 

0116 (16-bit immediate data zero-extended 

to 32 bits), 8-1 
1116 (16-bit immediate data, 

ones-extended to 32 bits), 8-1 
1(31-0) (Instruction Bus), 5-2, 8-7 



116 (16 -bit immediate data), 8-2 
IBACK (Instruction Burst Acknowledge), 

5-2, 5-8, 5-12, 5-13, 5-15 

IBREQ (Instruction Burst Request), 5-2, 

5-8,5-12,5-13,5-15 
l-Bus,4-15 
IE (Interrupt Enable) Timer Reload Reg., 

7-36 

lERR (Instruction Error), 4-4, 5-2, 5-7, 

5-12,5-26 
IFP (Instruction Fetch Pointer), 4-3 
IFU,4-16 

IN (Interrupt) Timer Reload Reg., 7-36 
In args, 7-2 

INCLK (Input Clock), 5-5, 5-31- 5-33 
Indirect addressing, 7-16 
Indirect addressing delay cycle, 7-17 
Indirect pointers, 7-16, 7-17, 7-41 
Initialization, timer facility, 7-37 
Input Clock (INCLK), 5-5 
Input/Output access, 5-3 
Instruction access, 5-7 
Instruction Access 

Exception, 4-4 
Instruction Address Transfer, 5-6 

Instruction boundary, 2-13 

Instruction Burst Acknowledge (IBACK), 

5-2, 5-8, 5-12, 5-13, 5-15 

Instruction Burst Request (IBREQ), 5-2, 

5-8,5-12,5-13,5-15 
Instruction Bus (1(31-0)), 1-4, 5-2 
Instruction/Data memory, 5-7 
Instruction/Data memory access, 5-7 
Instruction description format, 8-9 
Instruction Error (lERR), 4-4, 5-2, 5-7-5-8, 

5-12,5-26 
Instruction Fetch Pointer (IFP), 4-3 
Instruction Fetch Unit, 2-13, 4-2-4-3 
Instruction fetch, external, 4-1 
Instruction fetch-ahead, 4-1 
Instruction-field uses, 8-8 
Instruction formats, 8-6 
Instruction overview, 2-5 
Instruction Prefetch Buffer (IPB), 4-3, 4-16 
Instruction prefetch 

stream, 4-3 
Instruction Ready (IRDY), 4-4, 5-2, 5-7, 

5-10,5-12,5-13,5-16,5-26 
Instruction Register (IR), 5-25 
Instruction Request (IREQ), 5-2, 5-10, 

5-13,5-16-5-17 
Instruction Request Type (IREQT), 5-2 
Instruction ROM, 5-7 
Instmction set, 2-6 



Instruction Transfer, 5-6 
instruction, listing by operation 

code, 8-137 
Instruction, special, 4-16 
Instructions, three address, 2-1 
Integer addition, 7-18 
Integer division, 7-19 
Integer multiplication, 7-18 
Integer subtraction, 7-18 
Interrupt (IN), Timer Reload Reg., 7-36 
Interrupt handling, 7-29 
Interrupt or Trap, 3-62 
Interrupt processin g, use r-defined, 2-1 1 
Interrupt Request (INTR(3-0)), 5-4, 

5-29-5-30 
Interrupt return, 7-30, 8-84, 8-85 
Interrupt simulation, 7-30 
Interrupts, 1-8, 2-11, 2-13, 5-20, 7-35 
Interrupts, coprocessor, 6-4 
Interrupts, dynamically nested, 2-11, 7-30 
Interrupts, external, 5-29 
INTR(3-0) (Interrupt Request), 5-4, 5-29 
INV, 4-5, 7-34 
IPA (Indirect Pointer A) Indirect Pointer A 

Reg., 4-13, 8-2 
IPB, 4-13,5-12 
IPB (Indirect Pointer B) Indirect Pointer B 

Reg., 8-2 
IPB (Instruction Prefetch Buffer), 4-3 
IPB allocated state, 4-4 
IPB available state, 4-4 
IPB error state, 4-4 
IPB state transitions, 4-4 
IPB valid state, 4-4 
IPC (Indirect Pointer C) Indirect Pointer C 

Reg., 4-13, 8-2 
IR (In struction Register), 5-25, 5-26 
IRDY (Instr. Ready), 4-4. 5-2, 5-7, 5-1 1 , 

5-12,5-13,5-16,5-26 

IREQ (Instruction Request), 5-2, 

5-1 0--5-1 1,5-16-5-17 
IREQT (Instruction Request Type), 5-2 
IRET, 7-35 
IRETINV, 7-34, 7-35 

JMP, 7-39 

Jump, large range, 7-25 

Large call range, 7-25 

Large constants, 7-25 

Large data blocks, movement, 7-27 

Large jump range, 7-25 

Least Recently Used Entry (LRU), LRU 

Rec. Reg., 7-32 
LK (Lock), 7-35 



LOAD (Load), 7-40 

Load and Lock (LOADL), 5-20, 7-36 

Load and Set (LOADSET), 7-35-7-36 

Load data, fonwarding, 1-7 

Load Multiple (LOADM), 1-7, 4-14, 4-15, 

5-21,7-27 
Load Test Instruction, 5-4, 5-21 , 5-25 
Load Test Instruction mode, 2-17 
LOADL (Load and Lock), 5-20, 7-36 
LOADM, 1-7, 4-14, 4-15, 5-21 . 7-27 
Loads and Stores, 1-6 
Loads and Stores, overlapped, 7-39 
LOADSET (Load and Set), 7-35-7-36 
Load/Store Instruction Format, 3-47 
Local registers, 4-12, 7-4 
Local registers, stack pointer, 2-1 
Lock ( LK), 7- 36 
Lock (LOCK), 5-1, 5-20 
Lock output, 5-20 
Logical operation, 8-5 
LRU (Least Recently Used Entry) LRU 

Rec. Reg., 7-32 

MULTIPLU,7-17 

MULTIPLY, 7-17 

MULTM,7-17 

MULTMU,7-17 

M (IMmediate), 8-7 

Mapping activation record, 7-3, 7-7 

Master and slave switching, 5-33 

Master/slave operation, 2-19, 5-32 

Master/slave checking, 5-32 

Master/Slave Error (MSERR), 5-5, 5-32 

Memory management, 1 -7, 2-1 2, 7-30 

Memory Management Unit, 2-12, 2-15 

Memory protection, 7-28, 7-31 

Memory, critical areas, 7-31 

Merge, byte-aligned, 7-27 

MIPS, 1-2 

MMU, 4-8-4-9, 4-15, 4-16, 4-22-4-23, 

7-28. 7-31 
MMU Configuration Register, 7-41 
MMU Programmable (MPGM(I-O)), 

5-2,5-6 
Mode, Executing, 2-16 
Mode, Halt. 2-17, 5-22 
Mode, Pipeline Hold, 2-16, 4-23 
Mode, Step. 2-17.5-24 
Mode, Wait, 2-16, 3-59 
Monitoring critical areas, 7-32 
Move To Special Register (MTSR), 5-26, 

7-16,8-4 
MPGM(I-O) (MMU Programmable), 

5-2,5-6 
MSERR (Master/Slave Error), 5-5, 5-32 



MTSR (Move To Special Register), 5-26. 

7-16,8-4 
Multi-precision, 7-18 
Multi-processing, 7-35 
Multiple masters, 2-18, 5-19 
Multiple slaves, 5-1 9 
Multiplication Integer, 7-18 

N (Negative) ALU Status Reg., 8-4 

NN (Not Needed) Channel Control Reg., 

4-14,7-35 
NO-OP, 7-26, 7-38-7-39 
Nomenclature, 8-1 
Non-Coprocessor Load/Store 

Format, 3-47 
Non-sequential fetch, 4-1 
Non-sequential instruction fetch, 4-10, 5-4 
Normal, 5-5 
Not Needed (NN), Channel Control Reg., 

4-14, 7-35 
Notation, 8-1 
Numbering conventions, data-unit, 2-9 

Old Processor Status Register, 2-2, 3-78 

OP (operation code), 8-7 

Operating system calls, 7-17 

Operation code (OP), 8-7 

Operator symbols, 8-2 

OPT(2-0) (Option Control), 5-3, 5-6 

OPT (Option), 6-3 

Option (OPT), 6-3 

Option Control (OPT(2-0)), 5-3, 5-6 

OR, 7-39 

Organization, Branch Target Cache 

memory , 4-5 
Organization, dataflow, 1-5 
Out args, 7-2 
Out of range, 8-5 
Out of Range trap, 8-5 
OV (Overflow), 7-36 
Overflow (OV), 7-36 
Overflow (V), ALU Status Reg., 8-5 
Overflow, signed, 8-5 
Overflow, unsigned, 8-5 
Overlapped loads, 1-6 
Overlapped store, 1-6 

Page change information, 7-31 

Page fault, 7-34 

Page reference, 7-31 

Page size, virtual, 7-31 

Paging, 7-31,7-33 

PC (Program Counter), 4-10, 8-2 

PC Buffer, 4-10 



PC Bus, 4-11 

PC MUX, 4-11 

PC2-PC0, 4-11, 4-12, 7-29 

PCI (Program Counter 1) Program 

Counter 1 Reg., 5-20 

PDA (Pipelined Data Access), 5-3, 5-9 
PEN (Pipeline Enable), 5-2, 5-9, 5-10 
PiA (Pipelined Instruction Acknowledge), 

5-2,5-9,5-10 
PID (Process Identifier) MMU 

Configuration Reg., 3-18, 3-72, 3-77 
Pipeline, 1-7, 2-13, 4-2 
Pipeline data dependencies, 4-13 
Pipeline dependency, 4-13 
Pipeline Enable (PEN), 5-2 
Pipeline features exposed, 7-1 , 7-37 
Pipeline Hold, 4-2, 4-14 
Pipeline Hold mode, 2-16, 4-23, 5-4 
Pipeline interlocks, 1-11 
Pipelined access, 5-8, 5-9, 5-1 1 
Pipelined access protocol, 2-18 
Pipelined addresses, 1 -4 
Pipelined Data Access (PDA), 5-3, 5-9, 

5-10 

Pipelined Instruction Access (PIA), 5-2, 

5-9,5-10 
Port A, 4-13 
Port B, 4-13 
Port C, 4-13 
Prefetching, 1-5 
Primary access, 5-9, 5-10 
Prioritizer, 2-16, 4-19 
Priority, 5-29 
Process Identifier (PID), MMU 

Configuration Reg., 3-18, 3-72, 3-77 
Processor, 5-9 
Processor cancellation, 5-16 
Processor-generated clock, 5-31 
Processor modes, 2-1 6 
Processor preemption, 5-16 
Processor reset, 5-30 
Processor termination, 5-16 
Program Counter (PC), 4-10 
Program Counter Unit, 2-15, 4-10 
Programming, Coprocessor, 2-13 
Protected segment, 2-2 
Protection bits, supervisor mode, 7-28 
Protection bits, TLB, 7-31 
Protection bits, user mode, 7-28 
Protection checking, 4-15 
Protection Violation Trap, 7-17 
Protection violation, TLB, 7-28 
Protection, external access, 7-28 
Protection, memory, 7-28 



INDEX 1-5 



Protection, register, 7-28 
Protection, system, 7-28 

Q (Quotient/IVIultiplier) Q Register, 2-4, 8-2 

R/W, 5-1 

RA Register, 8-2, 8-7 

RBorl,8-7 

RB register, 8-2, 8-7 

RC register, 8-2^8-7 

Read/Write (R/W), 5-1 

Recursion, 7-1 

Region Mapping, 7-33 

Register address generator, 4-13 

Register addressing, 4-12 

Register bank protect, 3-5, 3-15, 7-28 

Register file, 1-4, 2-15, 4-12, 4-13 

Register file port, 4-12 

Register protection, 7-28 

Register RA, 8-2 

Register RB, 8-2 

Register RC, 8-2 

Register read-address comparators, 4-13 

Register, ALU Status, 2-4, 5-26, 7-29 

Register, Byte Pointer, 2-4 

Register, Channel Address, 2-3 

Register, Channel Control, 2-3 

Register, Channel Data, 2-3 

Register, Configuration, 2-3 

Register, Funnel Shift Count, 2-4 

Register, Indirect Pointer A, 2-4 

Register, Indirect Pointer B, 2-4 

Register, Indirect Pointer C, 2-4 

Register, Load/Store Count 

Remaining, 2-4 
Register, LRU Recommendation, 2-3 
Register, MMU Configuration, 2-3 
Register, Program Counter 0, 2-3 
Register, Program Counter 1 , 2-3 
Register, Program Counter 2, 2-3 
Register, Q, 2-4, 8-2 
Register, Register Bank Protect, 2-3 
Register, Timer Counter, 2-3 
Register, TLB, 2-4 

Register, Vector Area Base Address, 2-2 
Registers, delayed effects, 7-41 
Registers, global, 2-1 
Registers, local, 2-1 
Registers, local, stack pointer, 2-2 
Registers, special-purpose, protected, 2-2 
Relational operators, 4-19 
Relativ e branc h, 2-5 
Reset (RESET), 4-24, 5-5, 5-24, 5-33 
Reset mode, 2-1 7, 5-30, 5-33 



Resident pages, 7-32 

Restart, 7-34 

Restarting after faulty external 

access, 7-34 
Run-time checking, 7-1 7 
Run-time Stack, 7-2 

Stack Pointer (SP), 3-5. 7-2, 7-41 
SA (Set Coprocessor Active), 6-3 
SA (Special-Purpose Register 

number), 8-2 
Segment, protected, 2-2 
Set Coprocessor Active (SA), 6-3 
Set Indirect Pointers (SETIP), 7-16 
SETIP (Set Indirect Pointers), 7-16 
Shift, byte-aligned, 7-27 
Simple access, 5-8 
Simulation, interrupts, 7-30 
Slave cancellation, 5-1 7 
Slave device, 5-9 
Slave Mode, 5-10 
Slave preemption, 5-17 
SORT, 7-38 

Space Identification Field, 4-6-4-7 
SPDEST, 8-2 
SPECIAL, 8-2 
Spurious errors, 5-33 
SRCA, 8-2 
SRCA.BYTEaj, 8-2 
SRCB, 8-2 
SRCB.BYTEn, 8-2 
Stack Cache, 7-4 
Stack Pointer (SP), 1-11, 2-2, 3-5, 4-12, 

4-13 
Stack, compiler's run-time, 1-4-1-5 
Stack, run-time, 7-2 
STAT(2-0) (CPU Status), 5-5, 5-21, 5-24, 

5-27. 5-30 
Status results, arithmetic, 8-4 
Status results, logic, 8-4 
Step, 5-5, 5-22, 5-24, 5-25 
Stepmode, 2-16, 5-25 
Store and lock, 5-20, 7-36 
Store Multiple, 1-7, 4-14, 4-15, 5-21, 7-27 
STOREL, 5-20, 7-36 
STOREM, 1-7.4-14,4-15 
SUB, 7-39 
SUBR, 7-39 

Subtraction, integer, 7-18 
SUP/US (Supervisor/User), 5-1 
Supervisor mode (SM),_2-1 , 3-1 ,5-1 , 7-27 
Supervisor/User (SUP/US), 5-1 
Symbols, 8-1 
Synchronization, clock, 5-32 



Syntax, assembler, 8-4 

SYSCLK (System Clock), 5-5, 5-28, 

5-30-5-33 
System diagram, 1-3 
System interface, 2-1 7 
System programming, 7-28 
System protection, 7-28 
System-generated clock, 5-31 

Taking Interrupt or Trap, 5-4 

TARGET, 8-2 

Target, 4-8 

Target instruction, 4-6, 4-7 

Task Identifier (TID), TLB Entry 

Word 0,2-12 
Task identifiers, 1-7 
TC (Transfer Control), 6-2 
TCV (Timer Count Value) Timer Counter 

Reg., 7-36 
TE (Trace Enable), 3-78 
Terminology, 8-3 

Test (Test mode), 2-17. 5-5, 5-28 
Test/Development interface, 2-18, 5-21 
Timer Count Register, 7-36 
Timer Count Value (TCV) Timer Counter 

Reg., 7-36 
Timer Counter Register, 5-24 
Timer Facility, 2-13, 5-24, 7-36 
Timer Interrupts, 7-36 
Timer Reload Register, 7-36 
Timer Reload Value (TRV), 7-36 
TLB (Translation Look-Aside Buffer), 1-7 
TLB miss, 5-12 
TLB Miss handling, 7-31 
TLB[n]. 8-2 

TLB registers, 2-5, 3-32 
TLB reload, 7-29, 7-31 
TLB, second-level, 7-32 
TP (Trace Pending), 3-78 
Trace Enable (TE), 3-78 
Trace Facility, 2-13 
Trace Pending (TP), 3-78 
Trace Trap, 3-78 
Transfer Control (TC), 6-2 
Transfer, coprocessor, 6-2, 6-5 



Translation Look-Aside Buffer (TLB), 1-7 
Translation, early address, 1-7 
Translation, instruction address, 

4-22-4-23 
Translation, Load Multiple address, 4-23 
Translation, Store Multiple address, 4-23 
Translation, virtual to phiyslcal, 1-7 
Trap R equest (TraP(I-O)), 5-4, 5-29 
TRAP(I-O) (Trap Request). 5-4, 5-29 
Traps, 1-8, 2-11, 2-13, 5-20, 7-28, 7-35 
Traps, external, 5-29 
TRUE, 8-2 
TRV (Timer Reload Value) Timer Reload 

Reg., 7-36 
TWIN, 8-2 

UA (User Access), 6-3 
Underflow, signed, 8-3 
Underflow, unsigned, 8-3 
User Access (U A), 6-3 
User-defined, 5-6 

V (Overflow) ALU Status Reg., 8-4 
Valid bits. Branch Target Cache memory, 

4-5, 4-8 
Valid instructions in Cache, 4-9 
Valid transitions, 5-23 
VE (Valid Entry) TLB Entry Word 0, 4-5 
Vector Area, 1-8, 2-1 1 , 7-29 
Vector Area Base address, 2-2 
Vector number, 7-1 7 
Vectors, table of, 2-1 2, 3-59 
Virtual-page boundary, 5-17 
Virtual-page size, 7-30 
Virtual to physical address translation, 

1-7,3-72 
VN, 8-3, 8-7 

Wait mode, 2-16 
Warm start, 7-32 



Warn (WARN), 4-24, 5-4, 5-30, 5-31 

Z (Zero) ALU Status Reg., 8-4 
Zero (Z), 8-4 
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